Entries in forecasting (26)
Keeping track of errors to improve later on
Lately a couple of customers have been asking whether Lokad was keeping track of its past forecast errors in order to improve its future forecasts.
The answer is simple: yes, we do, but there are more than that. In particular, we do not wait for
- the forecasts to be requested,
- the course of events to happen,
- the historical data to be updated,
to finally compare our past forecasts with what really happened. Indeed, such an approach would be way too slow and inefficient.
Instead, we are using cross-validation methods adapted for the purpose of time-series forecasting.The process is more simple than it sounds, let's start with an example.
Let assume that we have a single time-series worth 1 year of weekly sales data (i.e 52 points). We want to produce 4-weeks sales forecasts - but we also want to estimate the forecasting error.
- take the N first points (with N = 10 initially).
- create a forecasting model based on those N points.
- create a 4-weeks ahead forecast based on this model.
- compare the forecast with the complete series.
- increment N of 1 point (i.e. 1 week).
- repeat.
With cross-validation, we can accurately estimate the expected forecast error of a forecasting model. In particular, if you have two different models, cross-validation can help you choosing the best one (*). Cross-validation can also be used to adjust model parameters - in order to find the parameters that best fit the data.
The Lokad team continuously monitors accuracy on delivered forecasts with such cross-validation methods and keeps working on more accurate forecasting models. Thus, we do keep track of our forecast errors, but without waiting for them to happen.
(*) If you try too many models, then you are likely to end-up with overfitting issues, but this problem is beyond the scope of this post.
Unfortunate "period start" setting name
In all the Lokad applications, both add-ons and the web application itself, there is a mysterious forecasting task setting named the period start.
Mea culpa. Choosing this name was really unfortunate and lead to many incomprehension.
Thus, I have decided to rename this parameter the period reference. It's default value is 2001-01-01 (i.e. Monday, January 1st, 2001), and unless you know why you need to change this value, I would strongly suggest you did not.
Let's start with a practical example. Let's assume that, as a retailer, you need monthly sales forecasts in order to make your monthly sales replenishment orders. Yet, your business months are starting the 15th of each month. All your suppliers are expecting your orders to be passed the 15th, and for years, all your monthly sales analysis have been starting the 15th.
In such a situation, it would be a real pain if Lokad was arbitrarily deciding that a month had to be starting the 1st. In order to avoid such a pitfall, Lokad provides an additional setting for the forecasting task definition that lets you adjust when you want the period to start: it's the period reference.
If you want your monthly forecasts to start the 15th of each month, then you can use 2001-01-15 as your period reference (or 1999-02-15 or 2017-11-15, the result would be the same). This date is used as a reference to infer all the other period's starting dates.
Further examples:
- If the period reference is set to 2001-01-15 for a yearly forecast, then all years start January 15h (instead of January 1st).
- If the period reference is set to 2001-01-15 for a weekly, then all weeks start on Tuesdays (because 2001-01-15 was a Tuesday).
In summary: The period reference, previously named "period start", is a date (past or future, it does not matter) used by Lokad to adjust the period boundaries both for historical data aggregation but also for the forecasts themselves. In particular, it has nothing to do with the starting date of the forecasts.
Errors are part of the (web) service
Since its launch in November 2006, Lokad has been providing a programmatic access to its forecasts through the Lokad Web API. Since the very first day, we have done our best to ensure the best forecasting accuracy. Yet, the future may not always be accurately predicted based on the sole historical data.
It is far better to foresee even without certainty than not to foresee at all. Henri Poincaré in The Foundations of Science, 1946
This situation is typically dealt with through an estimation of the forecast error: in addition to the forecast itself, an estimate of the forecast error is also computed. For example, this is the suggested approach for safety stock calculation.
Thus, we have decided to extend the Web API with estimated forecast errors. For any forecasting task, it is now possible to retrieve the Mean Absolute Percentage Error, see GetMape and GetMapes web methods. The new version of the Web API stays fully compatible with the existing applications.
Prediction markets vs. Lokad
In a previous post, I have been discussing the various worlds of forecasting software, outlining 3 main categories
- Deterministic simulation software
- Expert insights aggregation software
- Statistical forecasting software
Lokad is clearly a member of the third category. Although, those three categories are not really competing with each other since they are usually not suited for the same type of situations.
In the second category, insight aggregation software, prediction markets software seems to attract more and more interest. Jed Christiansen has a very interesting review of prediction markets software
The overview page of Inlink provides an insightful summary about prediction markets
Prediction markets enable a diverse group of people to predict the answer to a question by buying and selling shares in stocks representing the possible answers. Using a stock market-like mechanism allows people to express their opinion as a "weighted vote" over time in response to new information or a change of opinion. And unlike a poll, a prediction market is asking "what will happen?" vs. "what do you want to happen?"
For example, if we ask the question: "Who will win the singing contest?" The four contestants would be represented as stocks that people buy shares in. If "Contestant A" has a stock price of $56, that means "the crowd" thinks there is a 56% chance that contestant will win. When people buy shares in that contestant, the price goes up. When they sell shares in that contestant, the price goes down. The stock price of an answer represents the probability of that answer being correct, priced stock after a period of time is considered the groups answer to the question posed.
The main difference with classical insight aggregation software is that the participants are financially involved in getting the right forecast.
Compared to Lokad (or to any statistical forecasting software), the main benefits of markets prediction is the ability to rationally tackle a forecast that depends on (potentially) irrational customer desires even when no relevant data is available. The crowd is bringing a solution to the small group of experts bias that usually plagues classical prospective methods such as the Delphi method.
Yet, like any insight aggregation method, market predictions involve quite an expensive forecasting process to get a single question answered. For example, it would not seem a very practical approach for call centers that requires 96 quarter-hour forecasts on a daily basis to predict inbound call volumes. If meaningful historical data is available, then statistical forecasts should be as accurate (if not more) and way much cheaper.
In its own statistical ways, Lokad is also (somehow) using the wisdom of the crowd, except that instead of considering a panel of people, we are considering a panel of business time-series that we exploit to improve the overall forecasting accuracy. In both cases, leveraging larger input datasets to improve forecasting accuracy is a key idea.
Choosing the right forecast period
Forecasting consists in producing figures that are supposed to reflect the future. But those figures depend heavily on the period chosen for data aggregation. Lokad supports the most frequently used periods: quarter-hour, half-hour, hour, day, week, month, quarter, semester, year ...
Intuitively, the longer the considered period, the easier it is to make an accurate forecast. For example, yearly forecasts eliminate seasonal variations. Although a short forecasting period might provide a false sense of accuracy (ex: forecasting daily candy sales over the next two years) whereas a large period might be unsuited to take operational decision (ex: trying to optimize the weekly worker schedules of the candy manufacturing unit based on yearly forecasts).
A careful choice of the forecasting period is essential to make the most of forecasting. Yet, surprisingly, this question is frequently left mostly unanswered in books treating the subject of forecasting for practitioners (usually focusing on sales or demand forecasting). Typical answers are most of the manufacturing industry is using monthly forecasts and many large retailers are using weekly forecasts.
Yet, simple assumptions can lead to practical quantitative clues to make this choice. If we just assume that forecast errors follow a normal distribution, then expected error increase when switching to a shorter period is
- year → month: √(12/1) ≈ 3.5 (i.e. error multiplied by 3.5)
- month → week: √(31/7) ≈ 2.1 (assuming a month with 31 days)
- week → day: √(6/1) ≈ 2.5 (assuming 6 business days per week)
- hour → quarter-hour: √(4/1) = 2
Although, the normal distribution assumption is usually not exactly verified, those figures are quite representative of most situations. Those figures can be used to evaluate the opportunity to change the forecasting period if the forecast error is too high or if the forecast period is too long.
Chinese food vs. Sports bar while forecasting sales
Lokad has a pretty unique approach to forecasting where we leverage all the data that we have to perform every single forecast. While discussing with customers, I have been asked whether Lokad would mix Chinese food data with sports bar data. Indeed, the customer was worried that we might mix data that exhibits very different sales patterns although it was the same food and drink retail industry.
In fact, the more abstract question was: How refined is the notion of industry segments within Lokad? Well, the real answer is that we don't have any notion of industry segments in Lokad. And, in my humble opinion, it would be a really poor idea to even try to improve statistical forecasts based on such information
- No matter how refined is your industry segments classification, it's still a very poor approximation of the reality. Industry segments are changing all the time, and who knows whether sales of Thai food exhibit the same patterns that sales of Vietnamese food. In a way, this is why dmoz.org is massively less popular and useful than search engines.
- Even of small point of sales is usually generating a dozen of time-series (for each product being sold) at least 200 worked days per year. Thus, one year of history represent already more than 1.000 numbers to be exploited. The information contained in those number is dwarfing the amount of information contained in the classification that would typically be represented as just a few numbers.
- Creating a classification that matches the forecasting purposes is probably as hard, if not harder, than the forecasting task itself. Indeed, an efficient classification would be able tell whether business segments will exhibit same patterns in the future.
Instead of relying on such a manual classification, Lokad is relying directly on statistical correlations: if some data can be used to improve the considered forecast, then do it; if the data cannot be used to achieve that, then just ignore the data. With proper statistical tools, more data does not hurt and storing data has never been cheaper.
Back to the Chinese Food vs. Sport Bar initial example, the reality is more complex than it seems. Some products, sold in both places, let's say ice cream, might exhibit similar sales patterns because they depend a lot from the weather, while some others, let's say beers, might behave very differently. Lokad is relying on automated processes to validate the correlations for every single forecasts; as opposed to do it once for a whole industry segment.
Introducing industry segments in Lokad would be like reverting from full text search to a hierarchical directory: time-consuming, and, in the end, much less efficient.
Simplicity vs. Flexibility - Forecasts at stake
One year ago, when we initially designed Lokad, we have been considering many potential features
- more powerful data model than time-series.
- more features in the web application (we decided to have less).
- more indicators beside forecasts (confidence interval, estimated errors, ...)
Each time so far, we decided to do less as opposed to do more, the reason was that we did choose simplicity over flexibility (and Lokad isn't the only company doing this). The main issue with traditional statistical software is that you end up with a list of models so long that only a statistician can comprehend and use.
Simplicity often means faster and cheaper. What can be the benefits of forecasting if the process is too complicated and too time consuming to be used anyway? Lokad focuses on removing all technical obstacles that would prevent even small companies to start using forecasting.
The No1 never asked question about forecasting
There is a question of utmost importance when it comes to statistical forecasting: what is error function used during the learning process? Indeed, it's based on the error function that you can evaluate whether a forecast is good or bad. It's also the very same error function that drives your learning process when building a statistical model.
Finding an error function isn't hard. Quite the opposite, there are plenty of error functions available: Mean Squared Error (MSE), Mean Absolute Deviation, Median Absolute Deviation Error (MAD), Mean Absolute Percentage Error (MAPE). ...
Yet, in almost 1 year of existence for Lokad, the question of the choice of error function has never been raised by any customer. Well, this situation is very natural, as Lokad is precisely taking in charge the whole forecasting process.
For those who might be interested, the answer is, unfortunately, not simple. Lokad using several error functions depending on the context. We are often using bounded version of the MAPE (identical to the classical MAPE, but the function gets upper bounded to 1) for the benchmarks. The upper bound is used to make the process more robust against pathological time-series that would have had huge errors otherwise.
Yet, if the data is not too noisy (i.e. not too much outliers), then we are often using the MSE function which tends to be much more practical from a computational viewpoint.
Perceived quality issues in forecasting
Software quality is a challenge. As a software developer, you have to make sure that your code is going to run in all sort of unexpected situations, but still it has to work. Plenty of methods, tools and processes are available to improve quality. Lokad makes an intensive usage of those.
But there is another aspect, it's the perceived quality: the user's opinion depends on many (many) purely subjective aspects of your product. For B2C, product design and aesthetic are probably among the top factors in perceived quality (think iPod).
So far, so good, as a product developer, it simply means that you need to invest a certain amount of efforts in your product design. But what happens when perceived quality conflicts actual quality? (think to the devil's method to change your iPod battery).
In the case of Lokad, where we are delivering time-series forecasts, the situation is even more complicated because statistical forecasting is just so not intuitive.
For example, we have many customers who actually try out a couple of points to see what they get. Yet, this is really not the way to go to evaluate Lokad. The right way involves a proper training dataset and a testing dataset of your own actual business data (plus many other considerations, but it's beyond the scope of this post).
Unfortunately, for us, many customers are judging Lokad on the forecast they get after entering a dozen of points generated by some function like Cos(x) or Sin(x). Actually, it would be possible to hard-code a few heuristics in Lokad just to detect those attempts (and their underlying mathematical functions). But, by doing so those heuristics would actually decrease the overall accuracy for the users having real business data in their accounts.
Then, we have another issue: our forecasts are not exactly real time. You can retrieve your forecasts any time, but if you retrieve your forecasts 0.1s after finishing the upload of your data (through our Web Services API), Lokad won't have had the time to try complex/advanced statistical models. As a result, you will get real-time but naive forecasts.
Lokad does its best to provide an end-to-end forecasting service, but to some extend it can't escape the Law of Leaky Abstraction: in order to make the most of Lokad, one needs to understand, at least little, how statistical forecasting works and how the constraints are handled by Lokad.
Would you pay for moving average?
Many customers are asking THE question: which forecasting models are you using? Indeed, our technology page isn't very specific on the subject.
Disclaimer: I am not really going to answer this question in this post, so please, don't be too disappointed.
Actually, there are two main reasons why we do not disclose this information
- it's a proprietary technology (like Google search).
- it's a super counter-intuitive technology.
Yet, in order to clarify the situation, I can say that Lokad is not using any silver-bullet forecasting model (i.e. a super-model that would fit all situations), but tons of models instead.
For example, we do use simple moving average (among others naturally) which is probably the most naive forecasting method. Intuitively, simple moving average says: if you want to know the total sales next month, just take the average monthly sales over the last 6 months.
In the first sight, it might appear shocking to sell forecasts, if, in the end, it's moving average model that gets used. But, in my opinion, it is not.
Indeed, producing forecasts through a statistical model is only the last step of a complicated process. Before that, you need to choose the model to be used. And, this step is very complicated.
Thus, Lokad can indeed produce a forecast based on a moving average model, if we detect the moving average model as being the best available model for this particular situation (in practice, this situation arises for very short or very erratic time-series).
Batteries Included. Python motto.
But the key difficulty of the problem is to understand why the moving average model has been selected. With regular statistical packages, choosing the right model is the user's burden. With Lokad, it's part of the service.
Ps: there are more complex variant of the moving average where decreasing coefficients (also called weights) are applied to the time-series; but it's beyond the scope of the discussion.