Statistical forecasting is a counter-intuitive science. This was already said in the past, but we are going to emphasize again this point.
Frequently, we get people asking for support because they have just pushed some data to Lokad, and the forecasts they obtain are flat. In other words, the forecasted values are constant for all steps ahead. Ex: constant sales values for the next 6 months, if we are considering 6-month ahead monthly sales forecasts.
It’s perfectly clear though that there is not-chance for business sales to be perfectly flat for the next 6 months, so why Lokad keeps producing such meaningless results?
Well, we know for sure that business is going to change (at least a little) during the next six months. No question about that. Yet, the problem is: how can we produce a forecast as close as possible to those future changes? If we take the statistical road, then we need a statistical model to the forecasts.
The problem is that we need a good forecasting model; and the cardinal rule of statistical forecasting is that the more complex the model, the more data is needed for the model to be reliable. Models producing distinct forecasts for each step ahead are definitively more complex than the ones producing the same value for all steps ahead.
The other way around, we can also say that those more complex models are also less reliable on limited datasets which means that using them is very likely to decrease the overall forecasting accuracy in certain situations.
Back to the situation where people complain about flat forecasts, what is usually happening is simply that the data that has just been uploaded is either very short (like only 3 months of monthly history) or very sparse (like an eCommerce with only a handful sales for each product). In those situations, Lokad frequently goes for flat forecasts.
It’s not a bug, it’s an accuracy-improvement feature.
Reader Comments (6)
Hi Johan, the efficient representation of time-series into relational databases is a rather tricky business. In short, it’s very dependent from the length of the time-series you are considering. Then, I doubt that any close to an “Industry Standard” exists in this area. Then again, for the source code itself everything is very dependent from the length and number of time-series being considered. Hope it helps.
7 years ago | Joannes Vermorel
We are working on a Time Series Project with a typical Time Series Definition and Observations. Can you possibly assist with the following: 1. Do you have an Industry Standard structure for the Time Series Definition and assocated Observation from a DB design perspective. 2. What will the best practice in regards to the code, that represents a Time Series, be e.g. [Level 1].[Level 2].[Sequence Number].[Version]. Any guidance and assistance will be appreciated.
7 years ago | Johan Strydom
Hi Izi, that’s an interesting thought :-) I have been thinking of this sort of “attack” right from the start, though it’s not widely advertised on our website. Basically to do that you would need to very precisely know the data that your target is sending to Lokad. Then, you also need to know very precisely the type of correlations we are looking at. Simple time-series shift aren’t “natural” in real business data. Finally, it would cost you some money since we are not trusting data from non-paying users. Concerning our techniques, we use a complex mixture of many models, which happens to include well-known classics. This topic is obviously a bit sensitive for us, but I will try to post some not-too-detailed overview later on. Then, it must be noted that it’s still under rapid evolution.
9 years ago | Joannes Vermorel
So if I upload incorrect data, carefully crafted to correlate to a supplier / partner who I know uses your product, I can screw up his results? That’s interesting, and it seems pretty easy to do too! Otherwise, it would be nice to know more about the techniques you use (ex: ARIMA, ANN, etc.).
9 years ago | Izi
Hi Stephen, Good point. The root issue here is that the user might want not only the forecasts themselves, but also the rational behind them or, at least, a comment. In the future, we will probably try to improve the client apps of Lokad with automated guidelines, that will “comment” the forecasts delivered by Lokad.
9 years ago | Joannes Vermorel
Hi Joannes, do you think that this is useful user feedback in terms of describing what the user expects ? Potentially you could constrain the user to only be allowed to perform a forecasting task for a few months, say 3, and if more than this they would be given a warning about the affect on forecast accuracy. I guess it could also depend on the availability of the historical data, so maybe there would be some way to co-relate the two, I’m not sure. Also I think from the users points of view even though it might be more correct to return them a flat forecast, potentially they could be notified somehow about this case and/or given the option to accept it or accept a more nieve result ?
9 years ago | Stephen