Improving a forecasting technology

Since Lokad’s creation , our goal has been to relentlessly improve our forecasting technology in order to deliver superior forms of supply chain optimization. Almost a decade ago, I was already pointing out that being a machine learning company is odd: progress is steady but also non-linear and erratic. Furthermore, most angles that are considered as common sense in other domains are plain misguided as far as machine learning is concerned. Yet, this does not imply that this progress is left to chance: there is a method to it.

Improving our forecasting technology starts with the improvement of the data. Without proper data preparation, the process devolves into a garbage-in garbage-out exercise. Making sure that promotions, stock-outs, lead times are correctly represented in a given dataset takes a lot time and expertise. In practice, as data complications tend to be domain-specific, it takes a whole team of supply chain scientists at Lokad to consolidate a variety of datasets representing verticals as diverse as aerospace, fashion, food retail, etc.

Then, when we invent1 a new statistical method, usually it turns out that this method exhibits superior performance on a given dataset, and inferior performance on another dataset. Unfortunately, when this happens, the new statistical method tends to be fragile: it might be somewhat lucky or the victim of an overfitting problem. Thus, while it might be tempting to create a special-case for a given client of Lokad, because one statistical method appears to have a superior fit for this client, we do not operate that way. Our decade of experience has told us that those results invariably turn out to be fragile and that the supposedly superior method may not remain that way for long. If the client company undergoes substantial changes - which may well be caused by the very actions of Lokad - the new method’s performance may fall apart.

Thus, we focus instead on uncovering statistical methods that deliver superior results for a large variety of situations, across many somewhat unrelated verticals, ideally delivering a uniform improvement everywhere rather than a mix of improvements plus regressions, even if the mix is heavily skewed toward improvements. This methodology is more challenging than simply feature-engineering2 to death a given dataset, while endlessly recycling the same machine learning algorithm(s), which is what most data crunching agencies would deliver nowadays.

This approach forces us to revisit the very foundations of statistical forecasting. For example, the transition towards cross entropy as a superior metric to measure forecasting accuracy was instrumental in making the most of deep learning. More recently, we upgraded towards mixture density networks, a powerful yet underused approach3 to capture complex tail behaviors in supply chains. These mixture density networks provide a tractable solution to reliably estimate the probability of rare events, which is critical in industries such as aerospace.

Our forecasting technology remains a work in-progress. There are many challenges that are still imperfectly addressed. For example, cannibalizations and the market response to price changes remain very tough challenges. Nevertheless, we are not giving up , and even after 10 years of R&D, we are still making progress.

  1. We stand on the shoulders of giants. The R&D efforts of Lokad are typically variations of insights obtained from the broad machine learning community, who are typically not working on supply chain problems, but rather on mainstream problems such as pattern detection, voice recognition or natural language processing. [return]
  2. Feature engineering is a process which considers manually creating a representation of the dataset that is suitable for a given machine learning algorithm. Feature engineering is a powerful way to mitigate known weaknesses of machine learning algorithms. [return]
  3. The original paper Mixture Density Networks (MDN) by Christopher M. Bishop dates from 1994. Yet, it took almost two decades for the hardware to catch-up with the possibilities opened by this pioneering work. Unlike the original paper, which was applied to robot inverse kinematics, we are using MDNs to deliver probabilistic demand forecasts. [return]