Two KPIs for your OOS detector
A couple of weeks ago, we disclosed our plans concerning Shelfcheck, our future on-the-shelf availability optimizer targeting (physical) retailers. Since that time, we have been steadily moving forward, crushing a lot of point-of-sale data.
Lokad isn’t the only one company out there trying to tackle the OOS (out-of-shelf) issue, but there is very little literature about how to assess the respective merits of two OOS detectors. In this post, we review two fundamental metrics that define how good is a system at detecting OOS.
Intuitively, an indirect OOS detector (such as Shelfcheck) relies on the divergence between observed sales, and the expected sales. Since random (aka unpredictable) fluctuation of the market can always happen, this approach, by construction, cannot be a perfect system (1), it’s a tradeoff between sensibility and precision.
(1) Not being perfect does not imply being worthless.
The sensibility represents the percentage of OOS (aka the positives to be detected) that are captured by the system. This concept is already widely used in diverse areas ranging from medical diagnostics to airline security. The higher the sensibility the better the coverage of the system.
Yet, by increasing the sensibility, one also decreases the specificity of the system, that is to say, one decreases the percentage of non-OOS flagged as such (aka the negatives, that should not be detected). In practice, it means that by pouring more and more alerts, the OOS detector gives more and more false alerts, wasting the time of the store teams looking for non-issues.
Although, specificity is not a very practical criteria in the case of retail. Indeed, OOS products only represent a small fraction of the non-OOS products. Several studies quote 8% OOS as being a relatively stable worldwide average. Hence, the specificity is typically very high, above 90%, even if the OOS detector happens to be producing pure random guesses. Hence, those high specificity percentages are somewhat misleading as they only reflect the imbalance that exists between OOS and non-OOS in the first place.
At Lokad, we prefer the precision that represents the percentage of accurately identified OOS within all alerts produced by the system. The precision directly translates into the amount of efforts that will _not _be wasted by the store staff checking for non-existent problems. For example, if the precision is at 50%, then one alert out of two is a false-alert.
Neither 100% sensibility nor 100% precision is possible, or rather if you have 100% sensibility then you have 0% precision (all products being classified as OOS all the time). The other way around, 100% precision indicates that you have 0% sensibility (no alert gets ever produced). The tradeoff sensibility vs precision, cannot be escaped: if you want to detect anything, you need to accept that a some what you detect is incorrect.
In order to compare two OOS detectors, one needs to access their respective sensibility and precision. Then, in order to improve both the sensibility and precision, it remains possible to leverage a superior forecasting technology, as better forecasts will improve both the sensibility and precision.
Although, this raises another concern, how do you compare the following:
- a detector A with 70% sensibility and 60% precision;
- a detector B with 60% sensibility and 70% precision.
It turns out that this question cannot be addressed in a purely statistical manner: one needs to model the economic costs and benefits in order to assess the optimal choice.
Stay tuned for more.