Our video about overfitting received its share of attention since it was published 5 years ago, that is, a half a century ago for a startup like Lokad. Years later, we have made a lot of progress but overfitting remains a tough matter.
In short, overfitting represents the risk that your forecasting model is accurate only at predicting the past, and not at predicting the future. A good forecasting model should be good at predicting the data you do not have.
A common misconception is that there is no other way to assess a model except by checking its performance against the historical data. True, the historical data must be leveraged; however, if there is one insight to remember from the Vapnik-Chervonenkis theory is that all models are not born equal: some models carry a lot more structural risk - a concept part of the theory - than others. Entire class of models can be either considered as safe, or unsafe. from a pure theoretical perspective, which turns into very real accuracy improvements.
Overfitting issues cannot be avoided entirely, but they can be mitigated nonetheless.
There are several ways to mitigate overfitting. First, the one rule you should never break is: a forecasting model should never be assessed against the data that has been used to train the model in the first place. Many toolkits regress models on the entire history in order estimate the overall fit afterward. Well, as the name suggests, such a process gives you the fit but nothing more. In particular, the fit should not be interpreted as any kind of expected accuracy, it is not. The fit is typically much lower than the real accuracy.
Second, one simple way of mitigating ovefitting is to perform extensive back-testing. In practice, it means your process needs to split the input dataset over dozens - if not hundreds - of incremental date thresholds, and re-train all the forecasting models and re-assess them each time. Backtesting requires a lot of processing power. Being able to allocate the massive processing power it takes to perform extensive back-testing was actually one of the primary reasons why Lokad migrated toward cloud computing in the first place.
Third, even the most extensive back-testing is worth little if your time-series are sparse in the first place, that is, if time-series represent items of low sales volumes. Indeed, as most of the data points of the time-series are at zero, the back-testing process learns very little by iterating over zeroes. Unfortunately for commerce, about 90% of the items sold or serviced have a demand history that is considered as sparse from a statistical viewpoint. In order to address this problem, the performance of the model should be assessed with a multiple time-series viewpoint. It's not the performance of the model over a single time-series that matters, but its performance over well-defined clusters of time-series. Then, every becomes a balance between the local vs the global empirical accuracy when it comes to select the best model.
Any question? Don't hesitate to post them as comments.