Back to overview

Machine Learning and Empirical Asset Pricing

Applicant Korsaye Sofonias Alemu
Number 195068
Funding scheme Doc.Mobility
Research institution Booth School of Business University of Chicago
Institution of higher education Institution abroad - IACH
Main discipline Economics
Start/End 01.01.2022 - 31.08.2022
Show all

Keywords (5)

Regularized Empirical Likelihood; Regularized optimal portfolio problem; Machine Learning; Market frictions; Empirical Asset Pricing

Lay Summary (Italian)

La proposta di ricerca si basa sull’attuale sforzo nella letteratura finanziaria ed in particolare quella di Asset Pricing, per affrontare la proliferazione di fattori e variabili ritenuti in grado di spiegare la variazione trasversale di rendimenti attesi. Espongo due metodi che possono alleviare la cosidetta "maledizione della dimensionalita’’ in problemi empirici di Asset Pricing. Il primo metodo permette di stimare un modello parametrico di Asset Pricing in presenza di dimensione alta di asset, mentre il secondo metodo stima invece in modo non parametrico modelli condizionali in presenza di un numero elevato di predittori.
Lay summary

In assenza di consenso su quali siano le variabili piu’ rappresentative, la proliferazione di variabili predittivi di rendimenti  pone problematiche su possibile inferenza spuria. I due metodi principalmente impiegati per identificare i predittori di rendimento nella letteratura sono regressione lineare e l’ordinamento di portafogli. Entrambi I metodi soffrono dalla cosidetta “maledizione della dimensionalita’” che rende certe metodologie inaffidabili all’aumentare della dimensione delle variabili in questione. In questa proposta di ricerca propongo nuove metodologie, basati su Machine Learning e in particolare su Random Forest, che permettono di fare analisi predittiva in modo economicamente e econometricamente significativo, anche in presenza di variabili di alta dimensione. La recente adozione relevante di metodi basati su Machine Learning in finanza e economia implica la necessita’ di trattamenti statistici adeguati basandosi sul contesto di tali metodi. Alla luce di questo obiettivo, la metodologia esposta in questa proposta di ricerca, poiche’ puo’ fare luce sulle implicazionni implicite di tali metodi, puo' avere rilevanza e impatto al di la’ di Asset Pricing.

Direct link to Lay Summary Last update: 27.06.2021

Responsible applicant and co-applicants


This project builds upon the current effort in providing methods in Asset Pricing literature to address the proliferation of factors considered to have explanatory power for the cross-section of expected returns.In absence of consensus on which are the most representative variables, such proliferation poses series concerns over possible spurious inference and data mining. Cochrane (2011) advocates the need for ”different methods”, beyond the standard approaches, to address the issue of ”zoo of new variables”. The two mainly employed standard methods to identify return predictors are linear regression of returns on relevant characteristics, implying thus a linear relation between returns and factors, and characteristic-based portfolio sorts. Both methods suffer from the ”curse of dimensionality”. As the number of characteristics increases, the first method is prone to poor estimates and invalid inference, the second method instead implies an exponential growth of the number of sorted portfolios. A common approach in empirical asset pricing literature to alleviate the curse of dimensionality is through regularization techniques, borrowed often from the Statistics and Machine Learning literature, eg. Lasso, Elastic Net, etc., which induce variable selection (sparsity) and hence dimension reduction. This allows to do both model selection and cross-sectional return prediction. Nevertheless, this procedure is subject to two main issues. First, consistency in model selection is often obtained under strong assumptions, eg. beta-min and representable conditions, see Buhlmann and Van De Geer (2011), which are difficult to motivate in some contexts. Second, unless some correction mechanisms are applied, the inference based on these estimation methods are often spurious, see Chernozhukov, Chetverikov, Demirer, Duflo, Hansen, Newey, and Robins (2018) and references therein. Taking into consideration these issues, the literature cannot but benefit from a thorough econometric analysis when such Machine Learning techniques are employed in asset pricing problems.Having this objective, I lay out two methods in this proposal that can alleviate the curse of dimensionality in Empirical Asset Pricing setups. The first method allows to estimate a parametric Asset Pricing model in presence of large cross-section of assets. The second method instead estimates nonparametrically a conditional Asset Pricing model in presence of a large number of return predictors. My confidence of developing, in reasonable amount of time, economically meaningful and econometrically reliable Machine Learning procedures based on these two methods derives from related theories developed in my previous papers and recent Statistical Learning literature. I believe that the success of the research plan can have impacts in beyond Asset Pricing and Finance literature, as the generality of the proposed methods allows implementations to a broader research areas and applications.