From CRPS to cross-entropy

Published on by Joannes Vermorel.

Our deep learning technology is an important milestone for both us and our clients. Some of the changes associated with deep learning are obvious and tangible, even for the non-expert. For example, the Lokad offices are now littered with Nvidia boxes associated to relatively high-end gaming products. When I started Lokad back in 2008, I would certainly not have anticipated that we would have so much high-end gaming hardware involved in the resolution of supply chain challenges.

Then, some other changes are a lot subtler and yet as critically important: transitioning from CRPS (continuous ranked probability score) to cross-entropy is one of those changes.

The systematic use at Lokad of the CRPS metric was introduced at the same time as our 4th generation forecasting engine; our first native probabilistic engine. CRPS had been introduced as a generalization of the pinball-loss function, and it served its purpose well. At the time, Lokad would never have cracked its aerospace or fashion challenges – supply chain wise – without this metric. Yet CRPS which, roughly speaking, generalizes the mean absolute error to probabilistic forecasts, is not without flaws.

For example, from the CRPS perspective, it’s OK to assign a zero probability to an estimated outcome, if the bulk of the probability mass isn’t too far off from the actual observed outcome. This is exactly what you would expect for a generalization of the mean absolute error. Yet, this also implies that the probabilistic models may claim with absolute certainty that certain events won’t happen, while those events do indeed happen. This sort of vastly incorrect statistical statements about the future comes with a cost that is structurally under-estimated by CPRS.

The cross-entropy, in contrast, assigns an infinite penalty to a model that is proven wrong after assigning a zero probability to an outcome that does happen nonetheless. Thus, from the cross-entropy perspective, models must embrace the all futures are possible, just not equally probable perspective. Assigning a uniform zero probability whenever there isn’t sufficient data for an accurate probability estimate isn’t a valid answer anymore.

However, cross-entropy is not only superior from a purely theoretical perspective. In practice, using cross-entropy to drive the statistical learning process ultimately yields models that happen to be superior against both metrics: cross-entropy and CRPS; even if CRPS happens to be absent from the optimization process altogether.

Cross-entropy is the fundamental metric driving our 5th generation forecasting engine. This metric substantially departs from the intuition that was backing our older forecasting engines. For the first time, Lokad adopts a full Bayesian perspective on statistical learning, while our previous iterations were more grounded into the frequentist perspective.

Check out our latest knowledge base entry about cross-entropy.

Categories: Tags: statistics insights forecasting measure