With 20 million trees to a new inflation forecast

We present a new, powerful inflation model for the euro area.

people___profile_24_outline
Dr. Vincent Stamer

Commerzbank Economic Research

August 16 2024

The new model is based, among others, on the machine learning method “Random Forest”. It processes a large amount of data and estimates the components of inflation twelve months in advance. For example, the new forecasting model indicates an upward risk for goods prices. Inflation is likely to exceed 2% in the coming year.

A new forecasting model opens up new possibilities

The war in Ukraine and the supply chain crisis caused inflation in the euro area to skyrocket in 2022 – peaking at over 10%. Forecast models had not been able to predict this – partly because there had hardly been any movement in inflation in previous years and models were unable to correctly classify the sharp rise in energy prices and the supply chain problems. Now we have more data and better methods at our disposal. We have therefore developed a new, powerful inflation model that uses machine learning techniques and processes a large amount of data.

Random forests in the inflation forecast thrive on big data

We have developed a purely data-driven inflation model based on the latest research. While an Economic Insight explains all calculation steps in detail, the following section summarizes the most important steps and techniques: At its core, the new forecasting model combines a selection procedure from statistics and a machine learning method that have proven themselves in a similar form in research ( see study ). This combination of techniques allows the processing of a particularly large number of indicators and takes into account unusual movements in the data, such as jumps between inflation levels.

First, a selection procedure (Least Absolute Shrinkage and Selection Operator, LASSO) selects the most important series for the inflation forecast from up to 75 time series and decides what time lags are included in the forecast. The data selected in this way flows into a Random Forest model (RF). This machine learning technique is based on the principle of a decision tree: The months of the last twenty years are sorted into groups in this decision tree according to the independent variables explaining inflation (indicators such as wages or oil prices) and the average inflation of this group is calculated. A new, future month is then sorted into an existing group according to the classification learned. The historical inflation average of this group then serves as a forecast for the new month. The “learning effect” of the machine is that an enormous number of these decision trees are estimated in different combinations of indicators – in our case around 20 million per run. The results of each run are forecasts for the four components of inflation: Energy, food and beverages, non-energy industrial goods and services. The focus on inflation components allows us to use specific input time series, such as the oil price and producer prices for inputs, to forecast the energy and goods components of inflation in a targeted manner.

For full text see attached PDF-Version.