Notes about concurrent time-series (i.e. multi-parameters)

Published on by Joannes Vermorel.

An interesting question has been raised on the ADempiere wiki page about Lokad: does Lokad support multi-parameters time-series? Well, the answer is somehow.

In order to answer the question, we need first to clarify the situation. Intuitively a multi-parameters serie can be represented as a list of values [A(t), B(t) .... Z(t)]; the variables A, B ... Z being dependent of the time value t but also dependent of the other variables. Thus, we can have, for example, a relationship such as A(t)=B(t)+C(t). This kind of situation is usually called a concurrent time-series model with implicit or explicit relationships between the variables.

Implicit concurrent time-series

Intuitively, the notion of variable relationship is important because if two variables are correlated then the information of the first variable can be exploited to improve the forecast of the second variable. This insight is basically the heart of the Lokad technology. Yet, it must be noted that the Lokad user interfaces do not provide any way to "tell the system" that two particular time-series are correlated. For example, even if you know that A(t)=B(t)+C(t) in your data, you can't tell Lokad that such relationship is true.

Instead, Lokad automatically attempts to detect implicit relationships between time-series. Thus, as soon as you enter several time-series in your account, Lokad attempts to exploit the possible correlations between your time-series to increase the overall forecasting accuracy. You do not need to do anything in particular, this is the default behavior of our forecasting systems.

In the future, Lokad might provide some ways for the user to explicit the relationships between their time-series because the user might have some expert knowledge that could be used to improve the forecast accuracy as well. But at this time, there is no such feature provided by Lokad.

The usage of correlated data

Let's say that you have two time-series

  • S = the count of ice-creams sold on a daily basis.
  • T = the local daily average temperature around your ice-cream retail point.
Let's assume that the two time-series are correlated, i.e that people tend to eat more ice-cream when its hot. As a matter of fact, the only time-series that you are interested in, from a forecasting viewpoint, is the ice-cream sales (S). Indeed, an ice-cream retailer does not really care about producing weather forecasts (T). But, in the case of Lokad, if the data of T is entered into the Lokad account along with S; then Lokad will be able to implicitly exploit the temperature information (T) in order to increase the ice-cream sales forecast accuracy (S).

This kind of situation is precisely the reason why Lokad is charging by the forecasting task instead charging by the time-serie. Indeed, we do not want to overcharge our customers for simply adding the data that is required to improve the forecast accuracy of the time-series that really matter to them.


Categories: accuracy, forecasting, technical, time series, tips Tags: