Project

Back to overview

New Challenges for Statistical Methods in Large and Complex Data Settings: Analysis of Dependent Data and Model Selection

Applicant Guerrier Stéphane
Number 176843
Funding scheme SNSF Professorships
Research institution Institute of Management Geneva School of Economics and Management University of Geneva
Institution of higher education University of Geneva - GE
Main discipline Mathematics
Start/End 01.01.2019 - 31.12.2022
Approved amount 1'649'910.00
Show all

All Disciplines (8)

Discipline
Mathematics
Genetics
Pharmacology, Pharmacy
Medical Statistics
Experimental Cancer Research
Economics
Biochemistry
Science of management

Keywords (7)

Data Analysis; Statistics; Classification; Big Data; Spacial Statistics; Time Series; Computational Methods

Lay Summary (French)

Lead
Dans le contexte actuel du “Big Data”, la découverte de tendances et de signaux cachés dans d’imposantes quantités de données est en train de devenir l’un des défis majeur de la recherche scientifique moderne. Une illustration de l'explosion de la taille des données actuelles se trouve, par exemple, dans les domaines de la santé, du biomédical et de la recherche en sciences sociales où il est estimé que la dimension de la masse des données collectées double chaque année. A cette croissance exponentielle s'ajoute l'augmentation rapide de la complexité des modèles considérés dans de nombreux domaines de recherche. Ces deux facteurs, créent une multitude de défis numériques qui ne peuvent être relevés par les méthodes statistiques classiques actuelles. C'est donc pour répondre à ces besoins croissants de méthodes d’analyse de données adaptées à ces nouveaux défis numériques, que la recherche, notamment dans le domaine de la statistique, se focalise.
Lay summary
Notre principal objectif est de contribuer au développement de nouvelles méthodes statistiques en lien avec les défis numériques d'aujourd'hui. Plus précisément, ces méthodes doivent permettre de considérer des modèles complexes (pour lesquels il n'existe actuellement aucune solution satisfaisante) et/ou l'analyse d’imposantes quantités de données. Ces développements méthodologiques s'appliqueront en particulier (i) dans le contexte des données dites "dépendantes" (dans le temps et l'espace) ainsi que (ii) dans le cadre du problème de la sélection de modèles. Nous considérerons également (iii) le développement de nouvelles méthodes prédictives permettant de répondre directement à des critères de décisions basés sur une expertise scientifique et non sur un critère purement statistique qui peut, dans certains cas, amener à des contradictions. 
 
Ce projet de recherche vise à contribuer au développement de nouvelles méthodes statistiques permettant de répondre à certains défis numériques d'aujourd'hui, en particulier dans l'analyse d’imposantes quantités de données. Cette recherche sera conduite avec la perspective d'applications, dans un esprit interdisciplinaire, à des domaines ayant un fort impact sociétal, tels que les sciences pharmaceutiques ou les médias. Les résultats de ce projet seront mis à disposition de la communauté scientifique, académique et privée, par le biais de logiciels et autres outils numériques.
Direct link to Lay Summary Last update: 09.05.2018

Responsible applicant and co-applicants

Employees

Publications

Publication
Generalized additive models: An efficient method for short-term energy prediction in office buildings
Khamma Thulasi Ram, Zhang Yuming, Guerrier Stéphane, Boubekri Mohamed (2020), Generalized additive models: An efficient method for short-term energy prediction in office buildings, in Energy, 213, 118834-118834.
Wavelet-Based Moment-Matching Techniques for Inertial Sensor Calibration
XuHaotian, ZhangYuming, Guerrier Stéphane, Jurado Juan, Khaghani Mehran, Bakalli Gaetan, Karemera Mucyo, Molinari Roberto, Orso Samuel, Raquet John, Schubert Christine, Skaloud Jan (2020), Wavelet-Based Moment-Matching Techniques for Inertial Sensor Calibration, in IEEE Transactions on Instrumentation and Measurement, 69(10), 7542-7551.
Targeting hallmarks of cancer with a food system-based approach
Lachance James C, Radhakrishnan Sridhar, Madiwale Gaurav, Guerrier Stéphane, Vanamala Jairam KP (2020), Targeting hallmarks of cancer with a food system-based approach, in Nutrition, 69, 110563-110563.
Worldwide predictions of earthquake casualty rates with seismic intensity measure and socioeconomic data: a fragility-based formulation
Wang Yi “Victor”, Gardoni Paolo, Murphy Colleen, Guerrier Stéphane (2020), Worldwide predictions of earthquake casualty rates with seismic intensity measure and socioeconomic data: a fragility-based formulation, in Natural hazards review, 21(2), 04020001-04020001.
A multisignal wavelet variance-based framework for inertial sensor stochastic error modeling
Radi Ahmed, Bakalli Gaetan, Guerrier Stéphane, El-Sheimy Naser, Sesay Abu B, Molinari Roberto (2019), A multisignal wavelet variance-based framework for inertial sensor stochastic error modeling, in IEEE Transactions on Instrumentation and Measurement, 68(12), 4924-4936.
Multivariate signal modeling with applications to inertial sensor calibration
Xu Haotian, Guerrier Stéphane, Molinari Roberto Carlo, Karemera Mucyo (2019), Multivariate signal modeling with applications to inertial sensor calibration, in IEEE Transactions on Signal Processing, 67(19), 5143-5152.
Predicting fatality rates due to earthquakes accounting for community vulnerability
Wang Yi, Gardoni Paolo, Murphy Colleen, Guerrier Stéphane (2019), Predicting fatality rates due to earthquakes accounting for community vulnerability, in Earthquake spectra, 35(2), 513-536.
Simulation-Based Bias Correction Methods for Complex Models
GuerrierStéphane, Dupuis-LozeronElise, MaYanyuan, Victoria-FeserMaria-Pia (2019), Simulation-Based Bias Correction Methods for Complex Models, in Journal of the American Statistical Association (Theory & Methods), 20.
Empirical Predictive Modeling Approach to Quantifying Social Vulnerability to Natural Hazards
Wang Yi, Gardoni Paolo, Murphy Colleen, Guerrier Stéphane, Empirical Predictive Modeling Approach to Quantifying Social Vulnerability to Natural Hazards, in Annals of the American Association of Geographers, 1-32.
Exact Distributions and Performance of some Two-sample Nonparametric Tests for Circular Data
JammalamadakaSreenivasa, GuerrierStephane, MangalamVasudevan, Exact Distributions and Performance of some Two-sample Nonparametric Tests for Circular Data, in Sankhya B, 20.
Granger-Causal Testing for Irregularly Sampled Time Series with Application to Nitrogen Signaling in Arabidopsis
HeerahSachin, MolinariRoberto, StephaneGuerrier, Marshall-ColonAmy, Granger-Causal Testing for Irregularly Sampled Time Series with Application to Nitrogen Signaling in Arabidopsis, in Bioinformatics, 25.
Robust Two-Step Wavelet-Based Inference for Time Series Models
Guerrier Stéphane, MolinariRoberto, Victoria-FeserMaria-Pia, XuHaotian, Robust Two-Step Wavelet-Based Inference for Time Series Models, in Journal of the American Statistical Association (Theory & Methods), 25.

Collaboration

Group / person Country
Types of collaboration
Ecole Polytechnique Federale de Lausanne Switzerland (Europe)
- in-depth/constructive exchanges on approaches, methods or results
- Publication
City University of Hong Kong Hongkong (Asia)
- in-depth/constructive exchanges on approaches, methods or results
- Publication
University of Main United States of America (North America)
- in-depth/constructive exchanges on approaches, methods or results
- Publication
City University of New York United States of America (North America)
- in-depth/constructive exchanges on approaches, methods or results
- Publication
Pennsylvania State University (Penn State) United States of America (North America)
- in-depth/constructive exchanges on approaches, methods or results
- Publication
Chinese Academy of Science China (Asia)
- in-depth/constructive exchanges on approaches, methods or results
- Publication
University of California, Santa Barbara (UCSB). United States of America (North America)
- in-depth/constructive exchanges on approaches, methods or results
- Publication
University of Calgary Canada (North America)
- in-depth/constructive exchanges on approaches, methods or results
- Publication
University of Geneva Switzerland (Europe)
- in-depth/constructive exchanges on approaches, methods or results
- Publication
University of Illinois at Urbana-Champaign (UIUC) United States of America (North America)
- in-depth/constructive exchanges on approaches, methods or results
- Publication
Massachusetts Institute of Technology United States of America (North America)
- in-depth/constructive exchanges on approaches, methods or results
- Publication

Scientific events

Active participation

Title Type of contribution Title of article or contribution Date Place Persons involved
12th International Conference of the ERCIM WG on Computational and Methodological Statistics (CMStatistics) Talk given at a conference Simulated switched Z-estimation for accurate finite sample inference 14.12.2019 Londres, Great Britain and Northern Ireland Orso Samuel;
12th International Conference of the ERCIM WG on Computational and Methodological Statistics (CMStatistics) Talk given at a conference Finite sample unbiased estimation in high dimensional settings 14.12.2019 Londres, Great Britain and Northern Ireland Karemera Mucyo;
Joint Statistical Meetings 2019 Poster A simple recipe for making accurate parametric inference in finite sample 27.07.2019 Denver, Colorado, United States of America Karemera Mucyo;
Final CRoNoS meeting and Workshop on Multivariate Data Analysis (CRoNoS & MDA 2019) Individual talk A simple recipe for making accurate parametric inference in finite sample 14.04.2019 Limassol, Cyprus Orso Samuel;


Awards

Title Year
InnoSuisse award for the grant entitled: "Stochastic Modelling of Inertial Sensors for Precise GNSS-based Positioning", Engineering division, with Jan Skaloud (EPFL, principal grantee) & Markus Wenk (Hexagon Technology Center GmbH), amount: total CHF 917,280.00 & CHF 246,355.20 (≈ 27%) for the University of Geneva, period: 2020 - 2022 2019

Use-inspired outputs

Software

Name Year
cape: R package 2021
irg: R package 2020
avar: R package 2019
simts: R package 2019


Abstract

As the collection of data grows in size and complexity, one important aspect of scientific research lies in finding patterns or signals hidden in massive amounts of data that are of relevance to problems that need to be tackled in practice. Given the size of the problems, there is also a need to carry out this procedure in a computationally efficient manner and, more importantly, using sound statistical methods. Indeed, the fast growing production and gathering of data, at least indirectly, produces problems whose complexity grows at an equivalent speed and therefore new efficient statistical methods for proper data analysis become unavoidably necessary. In many cases, the complexity of the considered models and the relative numerical challenges entailed by them make the currently available statistical methods not viable, without considering their possible extensions which would be equally not sustainable. In the project I propose, I intend to contribute to the development of new computationally efficient statistical methods for the analysis of dependent data (in time and space) and model selection. The computational efficiency will be achieved by proposing simplified statistical methods which, remaining numerically tractable, will preserve desirable statistical properties with very little loss in terms of statistical efficiency. Moreover, I also intend to develop statistical methods that can directly take into account the expertise of scientists within a general theoretical framework, thereby attempting to avoid consequent data analyses that are performed sequentially, often leading to significant losses of information from one step to the subsequent one. This research project will be completed by the development of software and computational tools for a direct access to the research output on behalf of users from academia, public sector or private sector.
-