Learnings from Kaggle Forecasting Competitions: Introduction

Casper Bojer · 2019/06/09 · 7 minute read

This post is the first part of a series of posts, which attempt to provide an overview of the forecasting competitions held on Kaggle. In contrast to the academically organized M-competitions, which focus on learning about the relative performance of forecast methods under different circumstances, the Kaggle competitions are organized with the sole goal of solving forecasting problems faced by companies. The effect of this focus is that while high ranking contestants often share their approaches, the lessons learned are fragmented and it requires significant work for interested academics or practitioners to put the pieces together. Motivated by this, the main purpose of the series is to distill the key findings of the competitions in terms of the approaches that have been succesfully applied to solving forecasting problems.

This first post will provide an overview of the forecasting competitions that have been held on Kaggle throughout the years, while the remaining posts will dive into the competitions one at a time to provide details on the approaches that were successfully used. A final post will attempt to summarize and generalize the findings from the competitions with the hope of adding to the knowledge gained from the academic forecasting competitions.

The Competitions

A variety of forecasting competitions have been held on the Kaggle platform since its inception in 2010. The focus in these posts will be on the forecasting competitions held in the last five years, as a lot has happened in the field of forecasting and machine learning since 2010. This leaves the following forecasting competitions to examine:

  • Walmart Recruiting Store Sales Forecasting
  • Rossmann Store Sales Forecasting
  • Grupo Bimbo Store Sales Forecasting
  • Wikipedia Web Traffic Time Series Forecasting
  • Corporación Favorita Store Sales Forecasting
  • Recruit Restaurant Visitor Forecasting

It is clear from the list of competitions, that a majority of the competitions have focused on business forecasting with 4 of the 6 competitions focusing on forecasting store sales within the retail industry. The one competition that stands out from the rest is the Wikipedia Web Traffic competition, as the quantity to forecast is digital rather than physical. As such, it will be interesting whether the findings of this competition are similar to the findings from the competitions in the business domain. Despite similarities in terms of the domain for the retail forecasting competitions, there are still significant differences in the forecasting tasks. The next sections will provide a brief introduction to the different competitions in terms of the forecasting task and the evaluation setup. If you are short on time and want a quick overview, you can skip to the table at the end of the post.

Walmart Recruiting Store Sales Forecasting

Walmart is a well-known retail chain, primarly based in America, that operates a variety of stores. The Walmart competition comprised the tasks of predicting weekly department sales (in $) by store for 45 Walmart stores with a focus on correct prediction of weeks with holiday markdowns. Contestants were provided 33 months of data (143 weekly observations) per store and were required to forecast sales for the upcoming 39 weeks. The forecasts were evaluated using the weighted mean absolute error (WMAE), which assigns a weight of five to the special holiday markdown weeks and one otherwise:

\[ WMAE = \frac{1}{\sum{w_i}}\sum_{i = 1}^{n}w_i|y_i-\hat{y_i}| \] where n is the number of observations, \(w_i\) is the weight of observation i and \(\hat{y_i}\) is the forecast. As such the forecasts were evaluated using a forecast accuracy measure averaged over forecast horizons of 1 to 39 weeks.

Rossmann Store Sales Forecasting

Rossmann is a large drug store chain from Germany with stores in multiple European countries. The Rossmann competition tasked contestants with forecasting daily sales (in $) by store for 1115 Rossmann stores. Contestants were provided 31 months of data (942 daily observations) and were required to forecast sales for the upcoming 48 days. The forecasts were evaluated using the root mean square percentage error (RMSPE):

\[ RMSPE = \sqrt{\frac{1}{n}\sum_{i=1}^{n}(\frac{y_i-\hat{y_i}}{y_i})^2} \] The forecast accuracy measure was thus averaged over horizons of 1 to 48 days.

Grupo Bimbo Store Sales Forecasting

Grupo Bimbo is a large Mexican bakery product manufacturer with facilities in multiple countries. The Grupo Bimbo competition invited the contestants to forecast weekly unit sales by store (their clients) for a variety of bakery products. Contestants were given 7 weeks of data and asked to predict the unit sales for the upcoming two weeks for 745.164 stores and 1522 products. As such the dataset is very large in terms of the number of time series to predict, but very short in terms of the history available. The forecasts were evaluated using the root mean square logarithmic error (RMSLE):

\[ RMSLE = \sqrt{\frac{1}{n}\sum_{i=1}^{n}(log(\hat{y_i}+1)-log(y_i+1))^2} \] The forecast accuracy was averaged over horizons of of 1 to 2 weeks.

Wikipedia Web Traffic Time Series Forecasting

The Wikipedia Web Traffic competition tasked contestants to forecast daily visits by page and traffic type (e.g. mobile, desktop, etc.) for around 145.000 Wikipedia pages. The competition was conducted in a slightly different manner, as it was split in a training and an evaluation phase. For the training phase, contestants were provided two years of data (730 daily observations) and asked to predict daily page visits for the next 3 months (90 daily observations). The training phase was only used to allow participants to test out models prior to the actual evaluation. The evaluation phase extended the training data available to include more recent data for a total of 32 months of data (970 daily observations) and predictions were required for a period of one month starting 12 days after the last training data observation. The forecasts were evaluated using the symmetric mean absolute percentage error (SMAPE):

\[ SMAPE = \frac{100\%}{n}\sum_{i = n}^{n}\frac{|\hat{y_i}-y_i|}{(|y_i| + | \hat{y_i}|)/2} \] The forecast accuracy was thus averaged over horizons of 12 to 42 days.

Corporación Favorita Store Sales Forecasting

Corporación Favorita is a large Ecuadorian grocery chain that operates hundreds of supermarkets. The Corporación Favorita competition tasked contestants with forecasting daily unit sales by store and product for 3901 products and 54 stores. Contestants were given almost 5 years of data (1684 daily observations) and were asked to predict the upcoming 16 days. The forecasts were evaluated using the normalized weighted root mean squared logarithmic error (NWRMSLE):

\[ NWRMSLE = \sqrt{\frac{\sum_{i=1}^{n}w_i(ln(\hat{y_i}+1)-ln(y_i+1))^2}{\sum_{i=1}^{n}w_i}} \] where \(w_i\) are weights with a weight of 1.25 assigned for perishable products and 1.0 for the non-perishable products. The forecast accuract measures was thus averaged over 1 to 16 days.

Recruit Restaurant Visitor Forecasting

Recruit Holdings own and offer reservation and point-of-sales software within the restaurant industry. The Recruit Restaurant competition challenged contestants to forecast daily restaurant visits for 821 restaurants. Contestants were provided approximately 16 months of data (478 daily observations) and asked to predict visits for a period of 39 days. Thus, the forecasts have a horizon of 1 to 39 days. The forecasts were evaluated using the root mean square logarithmic error (RMSLE):

\[ RMSLE = \sqrt{\frac{1}{n}\sum_{i=1}^{n}(log(\hat{y_i}+1)-log(y_i+1))^2} \]

#Summary and looking forward Based on the above review, it is clear that the competition datasets vary considerably in terms of both the length and the number of related time series to forecast. There are, however, also certain similarities, such as all of the series being at either the daily or weekly level. The table below provides a summary of the competition’s characteristics.

In the next post, I will dive into the Walmart competition, focusing on the approaches used by the top 10 contestants. The post will look into the forecast methods, feature engineering and cross-validation strategies used and attempt to provide a summary of what worked well in the competition.

Competition Time Unit Unit # Obs. # Timeseries Horizon Acc. Measure
Walmart Weekly $ Sales by Department 143 45 1-39 WMAE
Rossmann Daily $ Sales by Store 942 1115 1-48 RMSPE
Grupo Bimbo Weekly Unit Sales by Product & Store 7 ~6.2M 1-2 RMSLE
Wikipedia Daily Views by Page and Traffic Type 970 ~145k 12-42 SMAPE
Corporación Favorita Daily Unit Sales by Product & Store 1684 ~210k 1-16 NWRMSLE
Recruit Restaurant Daily Visits by Restaurant 478 821 1-39 RMSLE