Project

Back to overview

Causal analysis with Big Data

Applicant Lechner Michael
Number 166999
Funding scheme NRP 75 Big Data
Research institution Schweizerisches Institut für Empirische Wirtschaftsforschung
Institution of higher education University of St.Gallen - SG
Main discipline Economics
Start/End 01.04.2017 - 31.03.2021
Approved amount 481'453.00
Show all

Keywords (2)

Causal econometric analysis; Big Data

Lay Summary (German)

Lead
Big Data kann nicht nur die Prognose, sondern auch die Wirkungsanalyse verbessern. Während jedoch für die Prognose in der Vergangenheit dramatische Fortschritte erzielt wurden, ist die Forschung zur Wirkungsmessung erst am Anfang. Daher wollen wir Big Data Methoden zur Wirkungsmessung untersuchen und weiterentwickeln und auf ausgewählte Forschungsfragen exemplarisch anwenden.
Lay summary

Wir werden im ersten Projektteil kausalanalytische Methoden aus der Mikroökonometrie mit den statistischen Methoden des maschinellen Lernens kombinieren. Die Eigenschaften der resultierenden neuen statistischen Verfahren untersuchen wir zuerst mit Hilfe von Simulationsmethoden. Danach überprüfen – und optimieren – wir die Methoden in drei Anwendungsgebieten auf ihre Praxistauglichkeit: 1) Wirkungsanalyse eines wirtschaftspolitischen Arbeitsmarktprogrammes, 2) Preisbildung auf Online-Gebrauchtwagenmärkten, und 3) Aufdeckung einer möglichen Diskriminierung von professionellen Fussballspielern.

In den letzten Jahren hat die mikroökonometrische Forschung grosse Fortschritte bei der Entwicklung eines Methodenapparates zur Beantwortung kausaler Fragestellungen erzielt. Diese Methoden wurden – etwa zur Beurteilung wirtschaftspolitischer Massnahmen – schon erfolgreich eingesetzt. Doch leider ist dieser Methodenapparat zur Analyse komplexer Datenmengen weitgehend ungeeignet. Lassen sich die Methoden erweitern, so dass auch in der Wirkungsmessung massive Fortschritte bei der Verwendung von Big Data erzielt werden können?

In diesem Projekt wollen wir die mikroökonometrische Methoden der Kausalanalyse (Wirkungsmessung) und die statistischen Prognosemodelle des maschinellen Lernens kombinieren, um die umfangreichen Datensätze für eine deutlich verbesserte Wirkungsanalyse wirtschaftspolitischer und privatwirtschaftlicher Entscheidungen verwenden zu können.

Bei erfolgreichem Abschluss des Projektes lassen sich in vielen wirtschaftlichen Bereichen deutlich zuverlässigere Aussagen über die Wirkung von einzelnen Massnahmen und Entscheidungen treffen. Somit ergibt sich einerseits für den öffentlichen Sektor die Möglichkeit einer effizienteren, da evidenzbasierten, Wirtschaftspolitik. Andererseits profitieren im privaten Sektor auch Firmen von verbesserten Entscheidungsgrundlagen.

Direct link to Lay Summary Last update: 26.07.2017

Lay Summary (French)

Lead
Le Big Data est à même d’améliorer non seulement les pronostics sur les évolutions économiques mais aussi les analyses d’impact économiques. Alors que d’énormes progrès ont été réalisés dans le passé en ce qui concerne les pronostics, l’utilisation des données pour l’évaluation de l’impact n’en est qu’à ses débuts. Nous développons donc des méthodes idoines afin de les appliquer à des thèmes de recherche sélectionnés.Dans ce projet, nous voulons combiner les méthodes microéconométriques de l’analyse de causalité (évaluation de l’impact) avec les modèles de prévision statistiques de l’apprentissage automatique, afin d’utiliser les vastes jeux de données en vue d’améliorer de manière significative l’analyse de l’impact des décisions économiques prises par les autorités politiques et les privés.
Lay summary

Dans la première partie du projet, nous combinerons des méthodes de l’analyse de causalité propres à la microéconométrie avec les méthodes statistiques de l’apprentissage automatique. Nous examinerons tout d’abord les propriétés des nouveaux procédés statistiques à l’aide de méthodes de simulation. Nous vérifierons ensuite – et optimiserons – l’utilité pratique des méthodes dans trois domaines d’application: 1) évaluation de l’impact d’un programme de politique économique concernant le marché du travail, 2) formation des prix sur le marché en ligne des voitures d’occasion, et 3) détection d’une possible discrimination des joueurs de football professionnels.

Ces dernières années, la recherche microéconométrique a fait de gros progrès dans le développement d’un appareil méthodologique pour répondre aux questions causales. Ces méthodes s’appliquent déjà avec succès - par exemple pour l’évaluation de mesures de politique économique. Cet appareil méthodologique se révèle malheureusement inapproprié pour analyser des volumes de données complexes. Ces méthodes peuvent-elles être élargies afin de réaliser des progrès importants dans l’évaluation de l’impact lors de l’utilisation de mégadonnées?

Si le projet s’achève avec succès, il sera possible, dans de nombreux de secteurs économiques, de faire des déclarations nettement plus fiables sur l’impact de certaines mesures et décisions. Dans le secteur public, il sera ainsi possible de mener une politique économique plus efficace, car basée sur des éléments factuels et concrets. Dans le secteur privé, les entreprises bénéficieront également de meilleures bases de décision.


Direct link to Lay Summary Last update: 26.07.2017

Lay Summary (English)

Lead
Big Data improves both the forecasting of economic developments and the analysis of their impact. Whereas dramatic advances in forecasting have been made in the past, use of this data for measuring impact is still in its infancy. The aim of this project is therefore to refine impact measurement methods and apply them to a selected sample of research questions.
Lay summary

In the first part of the project, we will combine causal analysis methods from microeconometrics with the statistical methods of machine learning. We will first examine the properties of the resulting new statistical processes using simulation methods. We will then investigate the practical feasibility of the methods – and optimise them – in three areas of application: 1) impact analysis of a labour market economic policy programme, 2) pricing in online used-car markets and 3) exposure of the possible discrimination of professional football players.

In recent years, microeconometric research has made great advances in the development of methodological tools for answering causal questions. These methods – e.g. for the assessment of economic policy measures – have been successfully employed. Unfortunately, these tools are largely unsuitable for analysing complex data volumes. Can methods be enhanced in such a way as to significantly advance the use of Big Data for impact measurement?

The goal of the present project is to combine the microeconometric methods of causal analysis (impact measurement) and the statistical forecasting models of machine learning to be able to use large-volume data sets to substantially improve the impact analysis of decisions taken by economic policymakers and private sector actors.

A successful outcome to the project could lead to much more reliable statements of the impact of individual measures and decisions in numerous economic contexts. While this would facilitate more efficient (since evidence-based) economic policymaking for the public sector, companies in the private sector would also benefit from improved decision-making tools.


Direct link to Lay Summary Last update: 26.07.2017

Responsible applicant and co-applicants

Employees

Publications

Publication
A double machine learning approach to estimate the effects of musical practice on student’s skills
Knaus Michael C. (2021), A double machine learning approach to estimate the effects of musical practice on student’s skills, in Journal of the Royal Statistical Society: Series A (Statistics in Society), 184(1), 282-300.
Machine learning estimation of heterogeneous causal effects: Empirical Monte Carlo evidence
Knaus Michael C, Lechner Michael, Strittmatter Anthony (2021), Machine learning estimation of heterogeneous causal effects: Empirical Monte Carlo evidence, in The Econometrics Journal, 24(1), 134-161.
Does the estimation of the propensity score by machine learning improve matching estimation? The case of Germany's programmes for long term unemployed
Goller Daniel, Lechner Michael, Moczall Andreas, Wolff Joachim (2020), Does the estimation of the propensity score by machine learning improve matching estimation? The case of Germany's programmes for long term unemployed, in Labour Economics, 65, 101855-101855.
Sorting in the used-car market after the Volkswagen emission scandal
Strittmatter Anthony, Lechner Michael (2020), Sorting in the used-car market after the Volkswagen emission scandal, in Journal of Environmental Economics and Management, 101, 102305-102305.
Heterogeneous Employment Effects of Job Search Programmes: A Machine Learning Approach
Knaus Michael C., Lechner Michael, Strittmatter Anthony (2020), Heterogeneous Employment Effects of Job Search Programmes: A Machine Learning Approach, in Journal of Human Resources, 0718-9615R-0718-9615R.

Collaboration

Group / person Country
Types of collaboration
Professor Bart Cockx Belgium (Europe)
- in-depth/constructive exchanges on approaches, methods or results
- Publication
- Research Infrastructure
Researchers at University of St. Gallen Switzerland (Europe)
- in-depth/constructive exchanges on approaches, methods or results
- Publication
- Research Infrastructure

Scientific events

Active participation

Title Type of contribution Title of article or contribution Date Place Persons involved
Machine Learning in Labor, Education, and Health Economics Talk given at a conference Priority to Unemployed Immigrants? A Causal Machine Learning Evaluation of Training in Belgium 20.11.2020 Nürnberg, Germany Lechner Michael;
Causality in the Social Sciences II Talk given at a conference Priority to Unemployed Immigrants? A Causal Machine Learning Evaluation of Training in Belgium 09.10.2020 Hannover (online), Germany Lechner Michael;
Labour Market Institutions (IZA) Talk given at a conference Priority to Unemployed Immigrants? A Causal Machine Learning Evaluation of Training in Belgium 11.09.2020 Bonn (online), Germany Lechner Michael;
EALE-SOLE-AASLE World Congress Talk given at a conference Double Machine Learning based Program Evaluation under Unconfoundedness 25.06.2020 Berlin, Germany Knaus Michael;
Causal Machine Learning Workshop St. Gallen Poster Double Machine Learning based Program Evaluation under Unconfoundedness 20.01.2020 St. Gallen, Switzerland Knaus Michael;
Causal Machine Learning Workshop St. Gallen Poster Active labour market policies for long-term unemployed: New evidence from causal machine learning 20.01.2020 St. Gallen, Switzerland Goller Daniel;
Causal Machine Learning Workshop St. Gallen Poster The Effect of Sport in Online Dating: Evidence from Causal Machine Learning 20.01.2020 St. Gallen, Switzerland Okasa Gabriel;
Seminar Individual talk Priority to Unemployed Immigrants? A Causal Machine Learning Evaluation of Training in Belgium 03.12.2019 München, Germany Lechner Michael;
International Biometrics Society Talk given at a conference Modified Causal Forest 09.09.2019 Lausanne, Switzerland Lechner Michael;
European Meeting of the Econometric Society Talk given at a conference Machine Learning Estimation of Heterogeneous Causal Effects: Empirical Monte Carlo Evidence 26.08.2019 Manchester, Stateless Knaus Michael;
6th annual conference of the International Association for Applied Econometrics Talk given at a conference Machine Learning Estimation of Heterogeneous Causal Effects: Empirical Monte Carlo Evidence 25.06.2019 Nikosia, Cyprus Knaus Michael;
International Society of Applied Econometrics Talk given at a conference Modified Causal Forest 25.06.2019 Nikosia, Cyprus Lechner Michael;
BGSE Summer Forum "Machine Learning for Economics" Talk given at a conference Machine Learning Estimation of Heterogeneous Causal Effects: Empirical Monte Carlo Evidence 17.06.2019 Barcelona, Spain Knaus Michael;
European Causal Inference Meeting Poster Machine Learning Estimation of Heterogeneous Causal Effects: Empirical Monte Carlo Evidence 27.03.2019 Bremen, Germany Knaus Michael;
Annual meeting of the German Statistical Society Talk given at a conference Modified Causal Forest 22.03.2019 Nürnberg, Germany Lechner Michael;
Machine Learning in Economics and Econometrics Poster Heterogeneous Employment Effects of Job Search Programmes: A Machine Learning Approach 13.09.2018 Munich, Germany Knaus Michael;
Retreat of the labour economics section of the University of Aarhus Talk given at a conference Causal Machine Learning in Labour Economics 15.08.2018 Aarhus, Denmark Lechner Michael;
European Causal Inference Meeting Poster Heterogeneous Employment Effects of Job Search Programmes: A Machine Learning Approach 09.04.2018 Florence, Italy Knaus Michael;


Self-organised

Title Date Place
Causal Machine Learning 20.01.2020 St. Gallen, Switzerland

Communication with the public

Communication Title Media Place Year
Other activities Report to the German Federal Ministry of Labour and Social Affairs - Improvements in matching by ML International 2020
New media (web, blogs, podcasts, news feeds etc.) Ökonomische Wirkungsanalyse mit Big Data International 2018
New media (web, blogs, podcasts, news feeds etc.) Personalisierung von «Was wäre wenn?»-Fragen International 2018
Media relations: print media, online media The Allocation of Training Programmes for Unemployed Persons HSG Focus International 2018

Associated projects

Number Title Start Funding scheme
187301 Chances and risks of data-driven decision making for labour market policy 01.05.2020 NRP 77 Digital Transformation

Abstract

In the last two decades, econometrics, in particular microeconometrics, has seen important developments of methods that allow understanding and using the conditions necessary to draw causal inference from non-experimental data. These methods make applied econometric studies substantially more useful for policy and business decisions. Furthermore, recent developments in computer science and the emerging availability of new large data, summarised under the heading of ‘Big Data’, enables the implementation of powerful machine learning methods in a broad range of applications. However, most studies implementing these methods have a clear focus on predictions. Predictions differ in several important aspects from causal analyses though. It is the purpose of this project to contribute to the ‘marriage’ of these two strands of data driven empirical methods and conceptual empirical research designs, such that machine learning tools can be successfully employed for causal analysis. Hence, this combination of methods enables the use of large and Big Data more successfully for the support of government, business, and private decisions.To this end, we aim to investigate the properties of existing empirical methods and develop extensions relevant for drawing causal inference. As the practical performance of different methods and algorithms in Big Data environments is of key importance, the investigation of the properties will go beyond a theoretical analysis and will be based on large-scale simulation studies and on three different applications. The latter cover very different topics to serve as examples for the broad range of possible applications of these methods. The three empirical topics investigated come from the fields of evidence-based governmental policies, price formation in the internet, and discrimination.
-