Joannes Vermorel
Lokad is about timeseries forecasting, but as simple as the timeseries model may seem to be (after all a timeseries is nothing more than a list of timevalue pairs), there are several subtleties in the way to manage timeseries. In this post, we will see how the Lokad timeseries model distinguishes missing timevalue pairs from empty timevalue pairs. Since the topic is slightly complex, I would suggest, if you’re not familiar the Lokad technology, to have a look at our User Guide (in particular, the Forecasting tasks section).
A practical situation
Let’s start with a practical reallife situation; let’s assume that we have a timeseries that include 12 timevalues, one value for each month of the year 2005 (starting January 2005, ending December 2005). We can imagine that this timeseries represent the monthly sales of a web shop. At the time I am writing this post, it’s the beginning of January 2007. What happen if I insert now this timeseries into my Lokad account and ask for a monthly forecast? Well, there is an ambiguity in the timeseries model, because there would be two possibilities:
 Returning a forecast for January 2007 (let’s call it the clockcentric approach). In this case, we would be considering the 12 values for the year 2006 are simply missing. Thus, we skip them a produce a forecast nonetheless but based on the data of the year 2005.
 Returning a forecast for January 2006 (let’s call it the datacentric approach): The forecast is based on the last timevalue pair available (i.e. December 2005 in the present situation), which is equivalent to the assumption that there is no missing values. In this case, the delivered forecast might refer to a period already part of the past.
Let’s make the things clear: Lokad has chosen the datacentric approach, if ask a monthly forecast for your 12 timevalues ranging from January 2005 to December 2005, you will get a forecast for January 2006, no matter if you request the forecast at the beginning of 2006 or in a distant future. Lokad takes the last timevalue pair of your timeseries as a reference to compute the forecasts. This option has been chosen because we believe it’s closer to the business requirements
Some arguments supporting the datacentric approach
Let’s review the arguments in favor of the datacentric approach:

The datacentric approach has a persistent semantic. If the input timeseries data do not change the forecast timerange do not either (yet the actual values of the forecast may change over time ).

The datacentric approach offers the possibility to benchmark the Lokad forecast services. You can import your 2005 product sales data in your Lokad account, get the forecast for 2006, and see how much difference lies between our forecasts and your historical record for 2006.

The datacentric approach assumes that there is no missing data in your timeseries data after the initial timevalue pair. This assumption has the strong advantage: its simplicity. Indeed, in some data mining fields, missing data are very frequent (think medical surveys for example), but when it comes to timeseries, it’s quite rare.
Yet, this approach involves a minor drawback: you need to handle explicitly the lack of data. For example, in the previous web shop situation, each product of the catalog may not have be sold even once a month. In such case, you must explicitly add a zero timevalue in your timeseries that represent this lack of sales.
Reader Comments (2)
We are using a technology developed internally at Lokad. The main reason being that our requirements are quite different from what is generally available for timeseries analysis (especially in terms of scalability). Also, the Lokad technology is anything but “definitive”. We continuously monitor the forecast performance of our algorithms based on the available customer data. Those benchmarks help us to improve our algorithms in a continuous manner.
11 years ago  Joannes Vermorel
Interesting concept  applying social computing to forecasting. I would think the social hurdles are difficult but not insurmountable. Some years back I did some work at a research firm. They did industry forecasts using surveys, then gave the results back to the participants. (then they sold reports on the forecasts). What tools are you using? Open Source or something like MATLAB or Maple? Michael
11 years ago  Michael