### Joannes Vermorel

Although we have tried to make Lokad as simple and intuitive as possible, statistical forecasting is a counter-intuitive science with many traps. In this post, I am going to describe one of the most frequent mistakes that I have encountered within many companies. In a nutshell, **it is wrong to make a sum of forecasted values**. Since the problem is quite hard to grasp, let’s start with an example.

Let’s say that you have *3 shops*; and that those 3 shops are selling *coconuts*. Being in charge of the supply chain, let’s say that you need to forecast how much coconuts must be re-ordered next week. The coconuts will not be delivered to the shops directly but to a single warehouse. Thus there is only a single coconuts replenishment order for the 3 shops.

In order to perform your replenishment forecast, it is natural to rely on your historical coconuts sales data. In the present situation, you have 3 time-series representing the daily coconuts sales for each one of the 3 shops. *How can we perform a single replenishment forecast based on those 3 time-series?*

A naive method would consist of making one coconuts’ sales forecast for each time-series (one forecast per shop), and then to make the sum of those 3 forecasted values in order to compute the replenishment order. Unfortunately, this method is *wrong*. A much more accurate approach consists of aggregating first the 3 time-series into one (i.e. summing the 3 time-series) and then performing a forecast directly on the aggregated time-series.

You are probably wondering what difference it makes between the two methods: forecasting first and then making the sum OR making the sum first then forecasting. Well, the *true* explanation requires some statistics that are totally beyond the scope of the post, thus I will try to give an intuitive explanation of the phenomenon. Summing forecasts does not improve the accuracy whereas making a forecast based one a single smoother time-series does improve the accuracy (the sum of 3 time-series is smoother than the initial time-series).