Statistical forecasting is a highly counter-intuitive field. And most assumptions which may seem intuitive at a first glance, turn out to be plain wrong. In this post, we compile a short list of the worst offenders among all the statistical oddities that make the bread and butter of Lokad’s business.
1. Advance forecasting systems DO NOT learn from their errors
Forecasting systems typically refresh their forecasts on a daily or weekly basis. Every time a new batch of forecasts is produced, a forecasting system has the opportunity to compare its older forecasts with the newly acquired data, and possibly learn from this. As a result, it would seem highly reasonable to expect any given forecasting system to learn from its errors, just as a human expert would do. However, this is not the case. An advance forecasting system will NOT try to learn from its errors. Indeed, better methods are available out there, namely [backtesting] (http://www.lokad.com/backtesting-definition), which offer superior statistical performance. With backtesting, the system re-challenges itself against the entire history available every time a forecast is generated, not just against the latest increment of data.
2. Most important statistical factors are noise and randomness
When practitioners are asked about the dominant factors in their demand, many answer: seasonality, product lifecycle, market pressure, business growth, etc. However, most of the time, there is an elephant in the room: the elephant being statistical noise found the in observation of demand.
Most of the time, the forecasting challenge is addressed as if, given sufficient efforts, demand forecasts could be made accurate. Yet, this viewpoint is incorrect, as most of the time forecasts are irreducibly inaccurate. Embracing the randomness found in the demand usually yields better business results than trying to eliminate this randomness.
3. Expert corrections generally make forecasts less satisfactory
While it seems reasonable to manually adjust statistical forecasts with industry-specific insights, we have observed, on many occasions, that this practice does not yield the desired results. Even when manual corrections are performed by an expert in this field, they just tend to degrade the overall accuracy, unless the underlying forecasting systems are inherently poor. Only in this case, can manual corrections help improve forecast results.
This is often linked to the fact that human perception is heavily biased towards the perception of “patterns”. Frequently this leads to false perceptions of trends, which are nothing more than random business fluctuations. Mistakenly interpreting randomness as a “pattern” tends to generate much more significant errors than just ignoring the pattern in the first place, and treating it as mere noise.
4. Forecasting error must be measured in Dollars
A more accurate forecast does not necessarily translate into better business results. Indeed, the classic way to look at forecasts consist of optimizing metrics such as MAPE (mean absolute percentage error) that are only weakly correlated with main business interests. Such metrics are misleading because they originate from rather delusional thinking that if forecasts were perfectly accurate, then MAPE error would be at zero. However, a perfectly accurate forecast is not a reasonable scenario, and the whole point of using a performance metric is to have it aligned with the interests of any given business. In other words, forecasting error should be expressed in Dollars, not percentages. Daily, weekly and monthly forecasts are not consistent.
If forecasts are produced both on a daily basis and on a weekly basis, it would be highly reasonable to expect that if the daily forecasts are summed into weekly forecasts, then both forecasts converge on the same values, given that the same technology and the same settings have being used to generate the two sets of forecasts in question.
Unfortunately, this is not true, and the two sets of forecasts will diverge; and for very sound statistical reasons too. In short, daily (resp. weekly) forecasts are optimized against a metric expressed at the daily (resp. weekly) level; statistically, as these two metrics are different, the numerical outputs of the numerical optimization have simply no reason to match.
5. SKU-level forecasts do not match category-level forecasts
If the same forecasting system is used to forecast demand both at the SKU level and at the category level, one would expect the two sets of forecasts to be consistent: by summing all the forecasts associated with the SKUs that belong to a given category, it would not be unreasonable to imagine ending-up with the same number than the forecast relating to the category itself. This is going to be the case for the same reasons as those outlined in the previous paragraph.
Even more alarming, it is actually very common to observe rather odd situations where completely divergent patterns exist between forecasts at the SKUs level and at category level. For example, all SKU forecasts might be strictly decreasing, while forecasts at the category level are steadily increasing. Another typical case is seasonality, which is very visible at the category level, but barely noticeable at the SKU level. When a similar situation arises, it may be tempting to try to correct SKU-level forecasts in order to align them with category forecasts, but such a technique would only degrade the overall accuracy of the forecast.
6. Changing the unit of measurement does matter
At a first glance, the unit used to measure the demand should not have any impact. If demand is counted in inventory units, and if all the points in the history are multiplied by 10, then one would expect all forecasts to be multiplied by 10 without any additional consequences. However, with technology such as that developed by Lokad, the forecasting process is not going to happen this way, at least, not exactly in this way.
Indeed, an advance demand forecasting technology does leverage many tricks using small numbers. The quantity of 1 is not just any quantity. For example, we have observed that, on average, supermarket and hypermarkets receipts come with more than 75% of their lines with a quantity equal to 1. This results in many statistical tricks being related to “small numbers”. Multiplying any given demand history by 10 would just confuse all the heuristics in place for any advance commerce forecasting system.
7. Best promotion forecasts are frequently generated when promotions are ignored
Forecasting promotions is difficult, really difficult. In retail, not only can the demand response to a promotion go from nothing (no uplift) to a 100x uplift, but the factors that influence promotions are complex, diverse and usually not accurately tracked in IT systems. Combining complex business behaviors with inaccurate data is a recipe which is likely to lead to a “Garbage In, Garbage Out” problem.
In fact, we have routinely observed that discarding promotional data was, at least as a very humble initial approach, the least inefficient way to forecast promotional demand. We are not claiming that this method is highly satisfying or optimal, but are merely trying to demonstrate that a native forecast built on correct but incomplete historical data usually outperforms complex models build on more extensive but partially inaccurate data.
8. The more erratic the history, the “flatter” the forecast
Visually, if historical data exhibits strong visual patterns, then one would expect a forecast to exhibit similarly strong visual patterns. However, whenever historical data exhibits erratic variations, this expectation does not hold, and the reverse happens: the more erratic the demand history, the smoother the forecasts.
Again, the root cause here is that the human mind is geared towards the perception of patterns. Erratic fluctuations are not patterns (in the statistical sense) but noise, and a forecasting system, if designed correctly, behaves precisely like a filter for that noise. Once the noise is removed, all that often remains is just a “flattish” forecast.
9. Daily, weekly and monthly forecasts are usually unnecessary
Periodic forecasts are everywhere - from business news to weather bulletins; and yet, they rarely represent an adequate statistical answer to “real-life” business challenges. The problem with these periodic forecasts lies in the fact that instead of directly tackling the business decision that depends on some uncertain future, they are typically leveraged in some indirect way to construct the decision afterward.
A much more effective strategy consists of a thinking about business decisions as being the forecasts. By doing so, it becomes much easier to align forecasts with specific business needs and priorities, e.g. measuring the forecast error in Dollars rather than percentages as detailed above.
10. Most of inventory forecasting literature is of little use
When confronted with any difficult subject, it is reasonable to begin exploring this subject by investigating the different peer-reviewed materials available in scientific literature. Especially since thousands of papers and articles are available to the reader, as far as demand forecasting and inventory optimization are concerned.
Yet, we have found that the quasi-totality of the methods analysed in such literature just do not work. Mathematical correctness does not translate into business wisdom. Many models considered as all-time classics are just plain dysfunctional. For instance,
- Safety stocks are false, since they are based on normal distribution assumptions,
- EOQ (economic order quantities) are inaccurate, as they are based on flat-fee per order that is completely unrealistic,
- Holt-Winters, is a forecasting model which is quite unstable numerically and requires too much historical depth to be tractable,
- ARIMA, which is the archetype of mathematically-driven approach, is far too complicated for too little results,
Oddities in demand forecasting are (probably) countless. Don’t hesitate to post your own observations in the comments section below.