Financial Machine Learning ought to be named its own discipline because of stark contrasts to traditional applications
The most exhilarating and exciting application of machine learning (ML) is in finance. It is easy to value a production model (you see your model’s performance the moment you execute a strategy). It is also the most challenging application of ML I know of.
The large majority of popular ML articles, blogs, YouTube videos, or white-papers are focused on, what I call, traditional applications. In this article, I bucket traditional ML applications into a camp when researchers assume normality, where observations are independent, and when the target does not structurally change over time.
The purpose of calling out a subsection of ML is to magnify and focus the attention of researchers and practitioners — for testing, documentation, and to solidify best practices.
For your interest, I am not the first practitioner of Financial ML to propose a demarcation from traditional applications: see Marcos Lopez de Prado’s recent book, here.
Understanding traditional ML
The most crucial distinction between traditional ML and financial ML is the classical statistical IID assumption. This assumption was etched into my brain during my first statistics course. Although important in traditional applications, it is an unrealistic assumption to uphold in finance.
When this assumption is taken, data are assumed to be distributed in a Gaussian-like manner. Observations or participants are assumed to be independent of one another. Both cannot be assumed in finance because observations (e.g., days in a series) are not independent (i.e., today’s level is dependent on yesterday’s level) and, due to trend and regime shifts, data are not normally distributed.
Structural breaks are abnormal, and sometimes random, shifts or changes in a time series structure.
Imagine that your machine learning target shifts in behavior, jumps to never before seen levels, or changes dramatically because of some macro- or micro-economic effect. One great example is when earlier this year, April 2020, WTI prices went negative for the first time in history.
Financial ML is a beast of a field
There are five main reasons you should consider financial ML as its own field of study. I have not explained some of these points in this article, but I likely will discuss these points in a future post. Stay tuned.
- The IID assumption is unrealistic in finance, even though researchers take this assumption after breaking up and transforming a time series.
- Unique data sources are scarce and expensive; common data like quarterly earnings are too common to easily gain an edge.
- Structural breaks are expected and not easily cared for.
- Compared to classical econometrics approaches, it is easy to overfit an ML model unless careful consideration of specific ML methodologies (feature importance, cross-validation, and evaluation metrics) are fine-tuned for financial applications. If your assembly line is properly constructed, then it will be harder to overfit an ML model compared to classical approaches.
- Backtesting is widely used to create and test theory, yet backtesting is not a good way to build a theory.
Finance and trading are the most fascinating and exciting application of machine learning and data science. This field is ripe and calling for innovation.
 M. Lopez de Prado, Advances in Financial Machine Learning (2018), Wiley