Projekt

Zurück zur Übersicht

AnaGraM: Adaptive numerical methods for time series analysis of time-dependent dynamical Graphs in the presence of Missing data

Titel Englisch AnaGraM: Adaptive numerical methods for time series analysis of time-dependent dynamical Graphs in the presence of Missing data
Gesuchsteller/in Horenko Illia
Nummer 152979
Förderungsinstrument Projektförderung (Abt. I-III)
Forschungseinrichtung Facoltà di scienze informatiche Università della Svizzera italiana
Hochschule Università della Svizzera italiana - USI
Hauptdisziplin Mathematik
Beginn/Ende 01.06.2014 - 31.10.2015
Bewilligter Betrag 105'300.00
Alle Daten anzeigen

Alle Disziplinen (2)

Disziplin
Mathematik
Molekularbiologie

Keywords (11)

nonstationary processes , multiscale processes, adaptive numerical methods, Gene expression analysis, Markov processes, transfer operator, high performance computing, time series analysis, graph inference, discrete stochastic processes, Gene Ontology

Lay Summary (Deutsch)

Lead
In der Graphentheorie werden Graphen als Modelle verwendet um Beziehungen zwischen Objekten darzustellen und deren Eigenschaften zu untersuchen. Google’s “PageRank-Algorithmus”, Navigationssysteme oder Netzwerkforschung (in sozialen, biologischen oder auch wirtschaftlichen Netzen) verdeutlichen, die große Bedeutung von Graphen auch in unserem alltäglichen Leben. Auch für die Auswertung von sog. BIG Data (Datenvolumina die in Größenordnungen von Terabyte starten) sind Graphen bereits unverzichtbare Hilfmittel geworden, da sie Beziehungen zwischen Elementen darstellen können und somit Ordnung und Struktur in unübersichtliche, vieldimensionale und häufig mit Fehlern oder fehlenden Informationen behaftete Datensätze/Zeitreihen bringen können. Die Auswertung von grossen Datenmengen verlangen allerdings nach höchsteffizienten Algorithmen und Analysestrategien und bringt die derzeitig verfügbaren Mittel an Ihre Grenzen.
Lay summary

Inhalt und Ziel des Forschungsprojekts

Das Ziel diese Antrages ist es – aufbauend auf dem erfolgreich abgeschlossenen Vorgängerprojekt “AnaGraph” – das dort entwickelte Framework auf folgende Weise weiterzuentwickeln:

  1. Die Algorithmen sollen um eine Komponente erweitert werden, die es erlaubt fehlende (oder offensichtlich falsche) Daten auf bestmögliche Art zu schätzenund den Datensatz so zu vervollständigen.
  2. Das Framework soll derart implementiert werden, das es auf modernen Supercomputern eingesetzt werden kann.
  3. Die bisherigen Anwendungsfelder (Klima/Wetter/Strömungforschung und Soziologie) sollen erweitert werden um Probleme der Bioinformatik, insbesondere der Analyse von Genexpressions Daten

Wissenschaftlicher und gesellschaftlicher Kontext des Forschungsprojekts

Im Rahmen dieses Projektes werden zukunftsweisende Algorithmen und Analysemethoden entwickelt, die auf einem breiten Anwendungsspektrum einsetzbar sind. Grosse und komplexe Datenmengen können mit Hilfe des hier entwickelten Frameworks besser bewältigt, analysiert und verstanden werden. Sämtliche  Algorithmen werden der wissenschaftlichen Gemeinschaft frei zur Verfügung gestellt werden.

 

Direktlink auf Lay Summary Letzte Aktualisierung: 27.04.2014

Verantw. Gesuchsteller/in und weitere Gesuchstellende

Mitarbeitende

Publikationen

Publikation
On inference of causality for discrete state models in a multiscale context
Gerber Susanne, Horenko Illia (2014), On inference of causality for discrete state models in a multiscale context, in Proceedings of the National Academy of Sciences of the United States of America, 111(41), 14651-14656.
Improving clustering by imposing network information
S. Gerber, I. Horenko (2015), Improving clustering by imposing network information, in Science Advances, 1(7), e1500163.

Wissenschaftliche Veranstaltungen

Aktiver Beitrag

Titel Art des Beitrags Titel des Artikels oder Beitrages Datum Ort Beteiligte Personen
DMV 2015 Vortrag im Rahmen einer Tagung Causality or Correlation? Multiscale inference and application to geoscience 21.09.2015 Horenko, Deutschland Horenko Illia; Gerber Susanne
Selection Symposium for an Associate Professor position "Breeding Informatics" Vortrag im Rahmen einer Tagung Multiscale Network-driven Integration of Omics data 20.07.2015 Goettingen University, Deutschland Gerber Susanne
Workshop "Causality in Turbulence" (organiser P. Koumoutsakos) Einzelvortrag Multiscale Causality Network Inference 08.06.2015 ETH Zuerich, Schweiz Horenko Illia
Selection Symposium for an Assistant Professor position "Bioinformatics" Vortrag im Rahmen einer Tagung Multiscale causality network inference in Bioinformatics 30.05.2015 JGU Mainz, Deutschland Gerber Susanne
MATHICSE Seminar at EPFL (organiser A. Abdulle) Einzelvortrag "Challenges of the data analysis in the multiscale context: causality inference and unresolved scales" 29.04.2015 EPFL Lausanne, Schweiz Horenko Illia
Seminar of the CRC 1114 Einzelvortrag Challenges of data analysis in a multiscale context 05.02.2015 FU Berlin, Deutschland Gerber Susanne; Horenko Illia
Seminar of the Courant Institute of Mathematical Sciences (organiser A. Majda) Einzelvortrag Challenges of multiscale data analysis 03.06.2014 NYU, New York, Vereinigte Staaten von Amerika Horenko Illia


Auszeichnungen

Titel Jahr
Mercator Fellowship of the German Research Foundation, in the Collaborative Research Center CRC 1114 “Scaling Cascades in Complex Systems” of DFG in Berlin. The formulation in the official letter from the DFG CRC 1114 was “Over the past several years, Prof. Horenko has developed ... data analysis techniques which, in our view, belong to the most advanced methodologies in this field.” (http://www.sfb1114.de/research/mercator-fellow) 2014

Verbundene Projekte

Nummer Titel Start Förderungsinstrument
131845 AnaGraph : Adaptive numerical methods for nonstationary time series analysis of time-dependent graphs in context of dynamical systems 01.10.2010 Projektförderung (Abt. I-III)

Abstract

Graph-theoretical approaches are becoming more and more popular in many areas of applied science. One of the main reasons for that is a rapid development of graph theory in the last years, relating it to the theory of dynamical systems and, especially relating graphs with stochastic dynamical systems like Markov processes and random walks. These groundbreaking theoretical developments have led to an emergence of a large family of practical applications like (just to mention a few of them): non-linear dimension reduction methods (e.g., based on diffusion and Laplacian eigenmaps); transfer operator approach to dynamical systems; or new engines for internet search such as Google's PageRank algorithm. Graph-related approaches are also getting a lot of attention as tools for analyzing and visualizing the relation between the different components of very large data sets. The current research proposal is a two-year follow-up of the previous three-year SNSF-project AnaGraph: Adaptive Numerical methods for time series Analysis of time-dependent graphs in context of dynamical systems. AnaGraph was investigating the methodological approaches to data-based inference of non-stationary (i.e., changing in time) graphs in context of dynamical systems. The research program of the AnaGraph-project was outperformed, resulting in 10 scientic papers (7 of them are already published and 3 are in press) in international peerreviewed journals. Computational tools developed in the AnaGraph-project have demonstrated optimal parallelizability on modern hardware architectures of the Swiss Supercomputing Centre in Lugano (CSCS) and are applicable to realistic data analysis problems. Based on a given time series of graph-related observables, developed algorithms allow to identify the temporal changes in the underlying models, as well as to perform a model/data reduction and informationtheoretical comparison of different methods. Main published applications of the methods developed in the predecessor AnaGraph-project were concentrated in two following application areas: (i) climate/weather/turbulence research; as well as (ii) in computational sociology. Main methodological aim of this follow-up proposal is an extension of the Finite Element Methods of time series analysis with Bounded Variation of model parameters (FEM-BV) developed in the previous AnaGraph-project towards the incomplete/missing observational data. Main challenge will be in creating the methods that allow a structure-preserving (in the sense of conservation properties and causality relations of the related dynamical system) inference of the underlying graph structures and missing data imputation. Emerging methods should operate beyond the restrictive stationarity and homogeneity assumption of the standard graph and matrix completion theory approaches. Main practical aim of the current follow-up proposal will be in High-Performance Computing implementation of the developed methods on modern supercomputers and their application to problems in bioinformatics: to gene expression data analysis in genomics, data-driven inference of probabilistic gene networks, their analysis and reduction. The practical problem will be formulated in FEM-BV context as graph problems with missing data and attacked by the methods and tools that were developed in the previous project (i.e., FEM-BV-methods for model inference and reduction) and the new missing data imputation tools that will be developed in the theoretical part of the current project. Previously published work and preliminary results demonstrate that the methods developed in the AnaGraph project have some conceptual and practical/computational advantages in comparison to the standard machine learning tools in this area (e.g., Bayesian mixture models, unsupervised hierarchical Bayesian clustering and K-means methods) in feature extraction for realistic gene expression sets. In a collaboration with external experts from experimental biology, as well as with the local experts in HPC (in the areas of cloud- and GPGPU-computing), graph-related data analysis and classification methods will be compared and evaluated wrt. their HPC-performance and potential in the rapidly developing area of bioinformatics.