Filtering by Tag: promotion

Probabilistic promotions forecasting

Published on by Joannes Vermorel.

Forecasting promotions is notoriously difficult. It involves data challenges, process challenges and optimization challenges. As promotions are present everywhere in the retail sector, they have been a long-term concern for Lokad.

However, while nearly every single retailer has its share of promotions, and while nearly every forecasting vendor claims to provide full support for handling promotions, the reality is that nearly all forecasting solutions out there are far from being satisfying in this regard. Worse still, our experience indicates that most of such solutions actually achieve poorer results , as far as forecasting accuracy is concerned, than if they were to use the naive approach which consists of simply ignoring promotions altogether.

What make promotions so challenging is that the degree of uncertainty that is routinely observed when working with promotions. From the classic forecasting perspective, which only considers the mean or median future demand, this extra uncertainty is very damaging to the forecasting process . In fact, the numerical outputs of such forecasting solutions are so unreliable that they do not provide any reasonable options for using their figures for optimizing the supply chain.

Yet, at Lokad, over the years, we have become quite good at dealing with uncertain futures. In particular, with our 4th generation probabilistic forecasting engine, we now have the technology that is completely geared towards the precise quantification of very uncertain situations. The probabilistic viewpoint does not make the uncertainty go away, however, instead of dismissing the case entirely, it provides a precise quantitative analysis of the extent of this uncertainty.

Our probabilistic forecasting engine has recently been upgraded to be able to natively support promotions. When promotional data is provided to Lokad, we expect both past and future promotions to be flagged as such. Past promotions are used to assess the quantitative uplift, as well as to correctly factor in the demand distortions introduced by the promotions themselves. Future promotions are used to anticipate the demand uplift and adjust the forecasts accordingly.

Unlike most classic forecasting solutions, our forecasting engine does not expect the historical data to be “cleaned” of the promotional spikes in any way. Indeed, no one will ever know for sure what would have happened if a promotion had not taken place.

Finally, regardless of the amount of machine learning and advanced statistical efforts that Lokad is capable of delivering in order to forecast promotions, careful data preparation remains as critical as ever. End-to-end promotion forecasts are fully supported as part of our inventory optimization as a service package.

Categories: Tags: forecasting promotion insights No Comments

Price elasticity is a poor angle for looking at demand planning

Published on by Joannes Vermorel.

Lokad regularly gets asked to leverage an approach based on the price elasticity of demand for demand planning purposes; most notably to handle promotions. Unfortunately, statistical forecasting is counter-intuitive, and while leveraging demand elasticity might feel like a “good” approach, our extensive experience with promotions indicates that this approach is misguided and nearly always does more harm than good. Let’s briefly review what goes wrong with price elasticity.

A local indicator

Price elasticity is fundamentally a local indicator - in a mathematical sense. So while if it is possible to compute the local coefficient of the price elasticity of demand, there is no guarantee that this local coefficient has any similarity with other coefficients that would be computed for alternate prices.

For example, it might make sense for McDonald’s to assess the elasticity coefficient for, say, the Big Mac moving from $3.99 to $3.89 because it’s a small price move - of about 2.5% in amplitude - and the new price remains very close to the old price. And given McDonald’s scale of activity, it’s not unreasonable to assume that the function of demand is relatively smooth in respect to the price.

At the other end of the spectrum, promotions, especially promotions in the FMCG (fast moving consumer goods) and general merchandize sectors, are completely unlike the McDonald’s case described above. A promotion typically shifts the price by more than 20%, which is an entirely non-local move, yielding very erratic results, which is completely unlike the smooth macro-effects that may be observed for McDonald's and its Big Mac.

Thresholds all over the place

The price elasticity insight is fundamentally geared towards smooth differentiable functions of demand. Oh yes, it is theoretically possible to approximate even a very rugged function with a differentiable one, but in practice, the numerical performance of this viewpoint is very poor. Indeed, markets are full of threshold effects: if customers are very price sensitive, then being able to offer them a price just a little bit lower than any competitors can alter the market share rather dramatically. In such markets, it’s unreasonable to assume that demand will smoothly respond to price changes. On the contrary, demand responses should be expected to be swift and erratic.

Hidden co-variables

Last but not least, one fundamental issue with using price elasticity for demand planning in the context of promotions, is that the price elasticity puts too much emphasis on the pricing aspect of demand. There are other variables, the so-called co-variables, that have a deep influence on the overall level of demand. These co-variables too often remain hidden, even though identifying them is very much feasible.

Indeed, a promotion is first and foremost a negotiation that takes place between a supplier and a distributor. The expected increase in demand does certainly depend on the price, but our observations indicate that changes in demand primarily depend on the way a given promotion is executed by the distributor. Indeed, the commitment on extra volume, a strong promotional message, additional or better-located shelf space and the potential temporary de-emphasis of competing products typically impact demand in ways that dwarf the pricing impact when it's examined on its own.

Reducing the promotional uplift to a matter of price elasticity is frequently a misguided numerical approach standing in the way of better demand planning. A deep understanding of the structure of promotions is more important than the prices.

Categories: Tags: promotion forecasting No Comments

Promotion planning in general merchandize retail - Optimization challenges

Published on by Joannes Vermorel.

So far, we covered data challenges and process challenges in the context of promotional forecasts. In this post, the last of the series, we cover the very notion of quantitative optimization when considering promotions. Indeed, the choice of the methodological framework that is used to produce the promotion forecasts and measure their quantitative performance is critically important and yet usually (almost) completely dismissed.

As the old saying goes, there is no optimization without measurement. Yet, in case of promotions, what are you actually measuring?

Quantifying the performance of promotions

The most advance predictive statistics remain rather dumb in the sense that it’s nothing but the minimization of some mathematical error function. As a consequence, if the error function is not deeply aligned with the business, there is no improvement possible, because the measure of the improvement itself is off.

It doesn’t matter to be able to move faster as long you don’t even know if you’re moving in the right direction.

When it comes to promotions, it’s not just the plain usual inventory economic forces:

  • inventory costs money; however, compared to permanent inventory, it can cost more money if the goods are not usually sold in the store, because any left-over after the end of the promotion will clutter the shelves.
  • promotions are an opportunity to increase your market shares, but typically at the expense of the retailer's margin; a key profitability driver is the stickiness of the impulse given to customers.
  • promotions are negotiated rather than merely planned; a better negotiation with the supplier can yield more profits than a better planning.

All those forces need to be accounted for quantitatively; and here lies the great difficulty: nobody wants to be quantitatively responsible for a process as erratic and uncertain as promotions. Yet, without quantitative accountability, it’s unclear whether a given promotion creates any value, and if it does, what can be improved for the next round.

A quantitative assessment requires a somewhat holistic measure, starting with the negotiation with the supplier, and ending with the far reaching consequences of imperfect inventory allocation at the store level.

Toward risk analysis with quantiles

Holistic measurements, while being desirable, are typically out of reach for most retail organizations that rely on median forecasts to produce the promotion planning. Indeed, median forecasts are implicitly equivalent to minimizing the Mean Absolute Error (MAE), which without being wrong, remains the archetype of the metric strictly agnostic of all economic forces in presence.

But how could improving the MAE be wrong? As usual, statistics are deceptive. Let’s consider a relatively erratic promoted item to be sold in 100 stores. The stores are assumed to be similar, and the item has 1/3 chances of facing a demand of 6 units, and 2/3 of facing a demand of zero unit. The best median forecast is here zero units. Indeed, 2 units per store would not be the best median forecast, but the best mean forecasts, that is, the forecast that minimizes the MSE (Mean Square Error). Obviously, forecasting a zero demand across all stores is buggy. Here, this example illustrates how MAE can extensively mismatch business forces. MSE show similar dysfunctions in other situations. There is no free lunch, you can't get a metric that is both ignorant of business and aligned with the business.

Quantile forecasts represent a first step in producing more reasonable results for promotion forecasts because it becomes possible to perform risk analysis, addressing questions such as:

  • In the upper 90% best case, how many stores will face a stock-out before the end of the promotion?
  • In the lower 10% worst case, how many stores will be left with more than 2 months of inventory?

The design of the promotion can be decomposed as a risk analysis, integrating economic forces, sitting on top of quantile forecasts. From a practical viewpoint, the method has the considerable advantage of preserving a forecast strictly decoupled from the risk analysis, with is an immense simplification as far the statistical analysis is concerned.

Couple both pricing and demand analysis

While a quantitative risk analysis already outperforms a plain median forecast, it remains relatively limited by design in its capacity to reflect the supplier negotiation forces.

Indeed, a retailer could be tempted to regenerate the promotion forecasts many time, varying the promotional conditions to reflect the scenarios negotiated with the supplier, however such a usage of the forecasting system would lead to significant overfitting.

Simply put, if a forecasting system is repeatedly used to seek the maximization of a function built on top of the forecasts, i.e. finding the best promotional plan considering the forecasted demand, then, the most extreme value produced by the system is very likely to be a statistical fluke.

Thus, instead the optimization process needs to be integrated into the system, analyzing at once both the demand elasticity and the supplier varying conditions, i.e. the bigger the deal, the more favorable the supplier conditions.

Obviously, designing such a system is vastly more complicated than plain median promotion forecasting system. However, not striving to implement such a system in any large retail network can be seen as a streetlight effect.

A policeman sees a drunk man searching for something under a streetlight and asks what the drunk has lost. He says he lost his keys and they both look under the streetlight together. After a few minutes the policeman asks if he is sure he lost them here, and the drunk replies, no, that he lost them in the park. The policeman asks why he is searching here, and the drunk replies, "this is where the light is".

The packaged technology of Lokad offers limited support to handle promotions, but this is an area that we address extensively with several large retailers, albeit in a more ad hoc fashion. Don’t hesitate to contact us, we can help.

Categories: insights, forecasting Tags: promotion forecasting optimization insights No Comments

Promotion planning in general merchandise retail – Process challenges

Published on by Joannes Vermorel.

Illustration In our previous post, we covered data challenges in promotion forecasts. In this post, we cover process challenges: When are forecasts produced? How they are used? Etc. Indeed, while getting accurate forecasts is tough already, retailers frequently do not leverage forecasts the way they should, leading to sub-optimal uses of the numerical results available. As usual, statistical forecasting turns to be a counter-intuitive science, and it’s too easy to take all the wrong turns.

Do not negotiate the forecast results

The purchasing department usually supervises the promotion planning process. Yet, as much haggling can be of tremendous power to obtain good prices from suppliers, haggling over forecasts don’t work. Period. Yet, we routinely observe that promotion forecasts tend to be some kind of tradeoff negotiated between Purchasing and Supply Chain, or between Purchasing and IT, or between Purchasing and Planning, etc.

Assuming a forecasting process exists - which may or may not be accurate (this aspect is a separate concern) - then, forecasts are not up to negotiation. The forecasts are just the best statistical estimate that can be produced for the company to anticipate the demand for the promoted items. If one of the negotiating parties has a provably better forecasting method available, then this method should become the reference; but again, no negotiation involved.

The rampant misconception here is the lack of separation of concerns between forecasting and risk analysis. From a risk analysis perspective, it’s probably fine to order a 5x bigger volume than the forecast if the supplier is providing an exceptional deal for a long lived product that is already sold in the network outside the promotional event. When people “negotiate” over a forecast, it’s an untold risk analysis that is taking place. However, better results are obtained if the forecasting and risk analysis are kept separate, at least from a methodological viewpoint.

Remove manual interventions from the forecasts

In general merchandise retail, all data process involving manual operations are costly to scale at the level of the network: too many items, too many stores, too frequent promotions. Thus, from the start, the goal should be an end-to-end automated forecasting process.

Yet, while (nearly) all software vendors promise fully automated solutions, manpower requirements creep all over the place. For example, special hierarchies between items may have to be maintained just for the sake of the forecasting systems. This could involve special item groups dedicated to seasonality analysis, or listing of "paired" products where the sales history of the old product is used as a substitute when the new product is found having no sales history in the store.

Also, the fine tuning of the forecasting models themselves might very demanding, and while supposedly a one-off operation, it should be accounted for as an ongoing operational cost.

As a small tip, for store networks, beware of any vendors that promise to visualize forecasts: spending as much as 10s per data point to look at them is hideously expensive for any fairly sized retail network.

The time spend by employees should be directed to the areas where the investment is capitalized over time - continuously improving the promotional planning - rather than consumed to merely sustain the planning activity itself.

Don’t omit whole levels from the initiative

The most inaccurate forecasts are that retailers produce are the implicit ones: decisions that reflect some kind of underlying forecasts but that nobody has identified as such. For promotion forecasts, there are typically three distinct levels of forecasts:

  • national forecasts used to size the overall order passed to the supplier for the whole retail network.
  • regional forecasts used to distribute the national quantities between the warehouses.
  • local forecasts used to distribute the regional quantities between the stores.

We frequently observe that distinct entities within the retailer’s organization end-up being separately responsible for parts of the overall planning initiative: Purchasing handles the national forecasts, Supply Chain handles regional forecasts and Store Managers handles the local forecasts. Then, the situation is made worse when parties start to haggle over the numbers.

When splitting the forecasting process over multiple entities, nobody gets clearly accountable for the (in)effectiveness of the promotional planning. It’s hard to quantify the improvement brought by any specific initiative because results are mitigated or amplified by interfering initiatives carried by other parties. In practice, this complicates attempts at continuously improving the process.

Forecast as late as you can

A common delusion about statistical forecasting is the hope that, somehow, the forecasts will get perfectly accurate at some point. However, promotion forecasts won’t ever be even close to what people would commonly perceive as very accurate.

For example, across Western markets, we observe that for the majority of promoted items at the supermarket level, less than 10 units are sold per week for the duration of the promotion. However, forecasting 6 units and selling 9 units already yields a forecast error of 50%. There is no hope of achieving less than 30% error at the supermarket level in practice.

Yet, while the forecasts are bound to an irreducible level of inaccuracy, some retailers (not just retailers actually) exacerbate the problem by forecasting further in the future than what it is required.

For example, national forecasts are typically needed up to 20 weeks in advance, especially when importing goods from Asia. However neither regional nor local forecasts need to be established so long in advance. At the warehouse level, planning can typically happen only 4 to 6 weeks in advance, and then, as far stores are concerned, quantitative details of the planning can be finalized only 1 week in advance before the start of the promotion.

However, as the forecasting process is typically co-handled by various parties, a consensus emerges for a date that fits the constraints of all parties, that is, the earliest date proposed by any of the parties. This frequently results in forecasting demand at the store level up to 20 weeks in advance, generating wildly inaccurate forecasts what could have been avoiding altogether by postponing the forecasts.

Thus, we recommend tailoring the planning of the promotions so that quantitative decisions are left pending until the last moment when final forecasts are finally produced, benefiting from the latest data.

Leverage the first day(s) of promotional sales at the store level

Forecasting promotional demand at the store level is hard. However, once the first day of sales is observed, forecasting the demand for the rest of the promotion can be performed with a much higher accuracy than any forecasts produced before the start of the promotion.

Thus, promotion planning can be improved significant by not pushing all goods to the stores upfront, but only a fraction, keeping reserves in the warehouse. Then, after one or two days of sales, promotion forecasts should be revised with the initial sales to adjust how the rest of the inventory should be pushed to the stores.

Don’t tune your forecasts after each operation

One of the frequent questions we get from retailers is if we revise our forecasting models after observing the outcome of a new promotion. While this seems a reasonable approach, in the specific case of promotion forecasts, there is a catch and a naive application of this idea can backfire.

Indeed, we observe that, for most retailers, promotional operations, that is, the set of products being promoted at the same period typically with some unified promotional message, come with strong endogenous correlations between the uplifts. Simply put, some operations work better than other, and the discrepancy between the lowest performing operations and the highest performing operations is no less than a factor 10 in sales volume.

As a result, after the end of each operation, it’s tempting to revise all forecasting models upward or downward based on the latest observations. Yet, it creates significant overfitting problems: revised historical forecasts are artificially made more accurate than they really are.

In order to mitigate overfitting problems, it’s important to only revise the promotion forecasting models as part an extensive backtesting process. Backtesting is the process of replaying the whole history, iteratively re-generating all forecasts up to the last and newly added promotional operation. An extensive backtesting mitigates large amplitude swings in the anticipated uplifts of the promotions.

Validate “ex post” promotion records

As discussed in the first post of this series, data quality is an essential ingredient to produce sound promotion forecasts. Yet, figuring out oddities of promotions months after they ended is impractical. Thus, we suggest not delaying the review of the promotion data and doing it at the very end of each operation, while the operation is still fresh in the mind of the relevant people (store managers, suppliers, purchasers, etc).

In particular, we suggest looking for outliers such as zeroes and surprising volumes. Zeroes reflect either that the operation has not been carried out or that the merchandise has not been delivered to the stores. Either ways, a few phone calls can go a long way to pinpoint the problem and then apply proper data corrections.

Similarly, unexpected extreme volumes can reflect factors that have not been properly accounted for. For example some stores might have allotted display space at their entrance, while the initial plan was to keep the merchandise in the aisles. Naturally, sales volumes are much higher, but it’s only a mere consequence of an alternative facing.

Stay tuned, next time, we will discuss of the optimization challenges in promotion planning.

Categories: accuracy, insights, forecasting Tags: promotion forecasting insights No Comments

Promotion planning in general merchandise retail – Data Challenges

Published on by Joannes Vermorel.


Forecasting is almost always a difficult exercise, but there is one area in general merchandise retail considered as one order of magnitude more complicated than the rest: promotion planning. At Lokad, promotion planning is one of the frequent challenges we tackle for our largest clients, typically through ad-hoc Big Data missions.

This post is the first of a series on promotion planning. We are going to cover the various challenges that are faced by retailers when forecasting promotional demand, and give some insights in the solutions we propose.

The first challenge faced by retailers when tackling promotions is the quality of the data. This problem is usually vastly underestimated, by mid-size and large retailers alike. Yet, without highly qualified data about past promotions, the whole planning initiative faces a Garbage In Garbage Out problem.

Data quality problems among promotion's records

The quality of promotion data is typically poor - or at least much worse than the quality of the regular sales data. A promotional record, at the most disaggregated level represents an item identifier, a store identifier, a start date (an end date) plus all the dimensions describing the promotion itself.

Tthose promotional records have numerous problems:

  • Records exist, but the store did not fully implement the promotion plan, especially with regards of the facing.
  • Records exist, but the promotion never happened anywhere in the network. Indeed, promotion deals are typically negotiated 3 to 6 months in advance with suppliers. Sometimes a deal gets canceled with only a few weeks’ notice, but the corresponding promotional data is never cleaned-up.
  • Off the record initiatives from stores, such as moving an overstocked item to an end aisle shelves are not recorded. Facing is one of the strongest factor driving the promotional uplift, and should not be underestimated.
  • Details of the promotion mechanisms are not accurately recorded. For example, the presence of a custom packaging, and the structured description of the packaging are rarely preserved.

After having observed similar issues on many retailer's datasets, we believe that the explanation is simple: there is little or no operational imperatives to correct promotional records. Indeed, if the sales data are off, it creates so many operational and accounting problems, that fixing the problem become the No1 priority very quickly.

In contrast, promotional records can remain wildly inaccurate for years. As long nobody attempts to produce some kind of forecasting model based on those records, inaccurate records have a negligible negative impact on retailer operations.

The primary solution to those data quality problems is data quality processes, and empirically validate how resilient are those processes when facing the live store's conditions.

However, the best process cannot fix broken past data. As 2 years of good promotional data is typically required to get decent results, it’s important to invest early and aggressively on the historization of promotional records.

Structural data problems

Beyond issues with promotional records, the accurate planning of promotions also suffers from broader and more insidious problems related to the way the information is collected in retail.

Truncating the history: Most retailers do not indefinitely preserve their sales history. Usually "old" data get deleted following two rules:

  • if the record is older than 3 years, then delete the record.
  • if the item has not been sold for 1 year, then delete the item, and delete all the associated sales records.

Obviously, depending on the retailer, thresholds might differ, but while most large retailers have been around for decades, it’s exceptional to find a non-truncated 5 years sales history. Those truncations are typically based on two false assumptions:

  • storing old data is expensive: Storing the entire 10-years sales data (down to the receipt level) of Walmart – and your company is certainly smaller than Walmart – can be done for less than 1000 USD of storage per month. Data storage is not just ridiculously cheap now, it was already ridiculously cheap 10 years ago, as far retail networks are concerned.
  • old data serve no purpose: While 10 years old data certainly serve no operational purposes, from a statistical viewpoint, even 10 years old data can be useful to refine the analysis on many problems. Simply put, long history gives a much broader range of possibilities to validate the performance of forecasting models and to avoid overfitting problems.

Replacing GTINs by in-house product codes: Many retailers preserve their sales history encoded with alternative item identifiers instead of the native GTINs (aka UPC or EAN13 depending if you are in North America or Europe). By replacing GTIN with ad-hoc identification codes, it is frequently considered that it becomes easier to track GTIN substitutions and it helps to avoid segmented history.

Yet, GTIN substitutions are not always accurate, and incorrect entries become near-impossible to track down. Worse, once two GTINs have been merged, the former data are lost: it’s no more possible to reconstruct the two original sets of sales records.

Instead, it’s a much better practice to preserve GTIN entries, because GTINs represent the physical reality of the information being collected by the POS (point of sales). Then, the hints for GTIN substitutions should be persisted separately, making it possible to revise associations later on - if the need arises.

Not preserving the packaging information: In food retail, many products are declined in a variety of distinct formats: from individual portions to family portions, from single bottles to packs, from regular format to +25% promotional formats, etc.

Preserving the information about those formats is important because for many customers, an alternative format on the same product is frequently a good substitute to the product when the other format missing.

Yet again, while it might be tempting to merge the sales into some kind of meta-GTIN where all size variants have been merged, there might be exception, and not all sizes are equal substitutes (ex: 18g Nutella vs 5kg Nutella). Thus, the packaging information should be preserved, but kept apart from the raw sales.

Data quality, a vastly profitable investment

Data quality is one of the few areas where investments are typically rewarded tenfold in retail. Better data improve all downstream results, from the most naïve to the most advanced methods. In theory, data quality would suffer from the principle of diminishing returns, however, our own observations indicate that, except for a few raising stars of online commerce, most retailers are very far from the point where investing more in data quality would not be vastly profitable.

Then, unlike building advance predictive models, data quality does not require complicated technologies, but a lot of common sense and a strong sense of simplicity.

Stay tuned, the next time, we will discuss of process challenges for promotion planning.

Categories: Tags: promotion forecasting retail data No Comments

What's wrong with promotion forecasts?

Published on by Joannes Vermorel.

It has been a little while since we posted some technical content about our forecasting methods. Let's discuss further the case of promotion forecasts.

Promotion forecasts are especially interesting for two reasons:

  • promotions represent a heavy part of the business for many companies.
  • forecasts are likely to end up really wrong when ad-hoc methods are used.

In order to clarify the traps looming behind promotion forecasts, let's have a look at the following schema. The two black curves represent fictitious product sales - illustrating typical consumer response to a promotion.

Model for retail promotion forecasts

Disclaimer: Those behaviors have been measured on Lokad databases. Yet, market responses vary a lot from one promotion to the next and from one business to the next. Our purpose here is not to provide accurate quantitative models, but rather to focus on what we believe to be key insights for promotion forecasts.

1. Primary promotion impact

The most important effect of a promotion is usually a large sales increase of the promoted product for the duration of the operation. This effect is illustrated with (1) in our schema.

Guessing that the sales are likely to increase is trivial, yet precisely estimating the impact of the promotion - that is to say the extra sales generated by the promotion itself - is a complicated process. Lokad is using its tags+events framework to do this.

It can be noted that classical forecasting methods such as exponential smoothing are completely missing promotional patterns.

2. Demand drop due to market saturation

The second effect of a promotion - which is too frequently overlooked - is the demand drop that comes just after the end of the promotion. See point (2) on the schema.

Indeed, by observing thousands of promotional operations, we have found that frequently, just after the end of a promotion, sales levels are dropping below the initial pre-promotion sales levels.

This drop reflects the temporary market saturation caused by the promotion itself. Basically, people who would have bought the product anyway have hurried their decision, resulting in fewer sales when the promotion stops.

When this drop is overlooked AND combined with a naive forecasting method, the combination can lead to extremely poor inventory management and overstocks. Indeed, the delayed behavior the exponential smoothing is likely to suggest inventory replenishments precisely when the demand is going to drop.

3. Mechanical echo due to customer synchronization

The third pattern comes from the synchronization effect of the promotion on customer replenishments. This point is illustrated with (3) on the schema.

All products (consumable or not) tend to have a lifecycle of their own. The promotion is synchronizing - to some extent - consumption patterns of customers. As a result, after the initial shock (the promotion itself), there is likely to be decreasing echos in customer demand.

In our measurement, it appears that most of the time only the first echo can be measured with enough confidence to be of any use for forecasting purposes. Subsequent echos do exist but they are too diffuse to be used to refine forecasts.

Again, if the demand bounce (3) happens right after the demand drop (2), then the exponential smoothing model is likely to cause frequent inventory shortage, because it completely fails to anticipate the bounce.

4. Demand drift due to market evolution

The fourth effect is the demand drift caused by the promotion. This effect is illustrated by point (4) in the schema.

A promotion acts as a market perturbation, and the base sales level is likely to be different after the promotion; hopefully higher, but our measures shows that the opposite, i.e. lower sales, also happens, yet with a lower frequency.

In our experience, establishing a quantitative forecast for the drift is really hard. Yet, the drift can be taken into account in order to speed-up forecasts adjustments after the end of the promotion.

5. Sales cannibalization

Sales cannibalization is typically the second major effect of a promotion in terms of absolute sales volumes. Cannibalized products end up with reduced sales during the promotion as illustrated by (5) on the schema.

Yet cannibalizations are harder to model, because the effect can be diffuse and potentially impact a wide range of unrelated products. Indeed, depending on the situation, some customers may have a fixed threshold for their spending. The immediate consequence of the threshold is that the opportunity spending offered by the promotion ends up compensated by restrictions on somewhat unrelated products.

Lokad is trying hard to provide fine grained promotion forecasts, taking into account all the effects that we can measure. If you decide to go for in-house forecasts (which we don't recommend, but well), we suggest to make sure that neither (2) nor (5) gets overlooked.

Categories: accuracy, forecasting, insights, time series Tags: forecasting issue practices promotion 2 Comments

Better promotion forecasts in retail

Published on by Joannes Vermorel.

Since our major Tags+Events upgrade last fall, we have been very actively working on promotion forecasting for retail. We have now thousands of promotional events in our databases; and the analysis of those events has lead us to very interesting findings.

Also it’s hardly surprising, we have found that:

  • promotion forecasts when performed manually by practitioners are usually involving forecast errors above 60% in average. Your mileage may vary, but typical sales forecast errors in retail are usually closer to 20%.
  • including promotion data through tags and events reduces the average forecast error by roughly 50%. Again your mileage may vary depending the amount of data that you have on your promotional events.

As a less intuitive result, we have also found that rule-based methods and linear methods, although widely advertised by some experts and some software tools, are very weak against overfitting, and can distort the evaluation of the forecast error, leading to a false impression of performance in promotion forecasting.

Also, note that this 50% improvement has been achieved with usually quite a limited amount of information, usually no more than 2 or 3 binary descriptor per promotion.

Even crude data about your promotions are leading to significant forecast improvements, which turns into significant working capital savings.

The first step to improve your promotion forecasts consists in gathering accurate promotion data. In our experience, this step is the most difficult and the most costly one. If you do not have accurate records of your promotions, then there is little hope to get accurate forecasts. As people says, Garbage In, Garbage Out.

Yet, we did notice that even a single promotion descriptor, a binary variable that just indicates whether the article is currently promoted or not, can lead to a significant forecast improvement. Thus, although your records need to be accurate, they don’t need to be detailed to improve your forecasts.

Thus, we advise you to keep track precisely of the timing of your promotions: when did it start? when did it end? Note that for eCommerce, front page display has often an effect comparable to a product promotion, thus you need to keep track of the evolution of your front page.

Then, article description matters. Indeed, in our experience, even the most frequently promoted articles are not going to have more than a dozen promotions in their market lifetime. In average, the amount of past known promotions for a given article is ridiculously low, ranging from zero to one past promotion in average. As a result, you can’t expect any reliable results by focusing on the past promotions a single product at a time, because, most of the time there isn’t any.

So instead, you have to focus on articles that look alike the article that you are planning to promote. With Lokad, you can do that by associating tags to your sales. Typically, retailers are using a hierarchy to organize their catalog. Think of an article hierarchy with families, sub-families, articles, variants, etc.

Translating a hierarchical catalog into tags can be done quite simply following the process illustrated below for a fictitious candy reseller:

The tags associated with the sales history of medium lemon lollipops would be LOLLIPOPS, LEMON, MEDIUM

This process will typically create 2 to 6 tags per article in your catalog - depending on the complexity of your catalog.

We have said that even very limited information about your promotions could be used to improve your sales forecasts right away. Yet, more detailed promotion information clearly improves the forecast accuracy.

We have found that two items are very valuable to improve the forecast accuracy:

  • the mechanism that describes the nature of the discount offered to your customers. Typical mechanisms are flat discount (ex. -20%) but there are many other mechanisms such as free shipping or discount for larger quantities (ex: buy one and get one for free).
  • the communication that describes how your customers get notified of the promotional event. Typically, communication includes marketing operations such as radio, newspaper or local ads, but also the custom packaging (if any) and the visibility of promoted articles within the point of sales.

In case of larger distribution networks, the overall availability of the promotion should also be described if articles aren’t promoted everywhere. Such situation typically arises if point of sales managers can opt out from promotional operations.

Discussing with professionals, we have found that many retailers are expecting a set of rules to be produced by Lokad; and those rules are expected to explain promotions such as


Basically, those expected rules always follow more or less the same patterns:

  • A set of binary conditions that defines the scope of the rule.
  • A set of linear coefficients to estimate the effect of the rule.

We have found that many tools in the software market are available to help you to discover those rules in your data; which, seemingly, has lead many people to believe that this approach was the only one available.

Yet, according to our experiments, rule-based methods are far from being optimal. Worse, those rules are really weak against overfitting. This weakness frequently lead to painful situations where there is a significant gap between estimated forecast accuracy and real forecast accuracy.

Overfitting is a very subtle, and yet, very important, phenomenon in statistical forecasting. Basically, the central issue in forecasting is that you want to build of model that is very accurate against the data you don’t have.

In particular, the statistical theory indicates that it is possible to build models that happen to be very accurate when applied to the historical data, and still very inaccurate to predict the future. The problem is that, in practice, if you do not carefully think of the overfitting problem beforehand, building such a model is not a mere possibility, but the most probable outcome of your process.

Thus, you really need to optimize your model against the data you don’t have. Yet, this problem looks like a complete paradox, because, by definition, you can’t measure anything if you don’t have the corresponding data. And we have found that many professionals gave up on this issue, because it doesn’t look like a tractable thinking anyway.

Our advice is: DON’T GIVE UP

The core issue with those rules is that they perform too well on historical data. Each rule you add is mechanically reducing the forecast error that you are measuring on your historical data. If you add enough rules, you end-up with an apparent near-zero forecasting error. Yet, the empirical error that you measure on your historical data is an artifact of the process used to build the rules in the first place. Zero forecast error on historical data does not translate itself into zero forecast error on future promotions. Quite the opposite in fact, as such models tend to perform very poorly on future promotions.

Although, optimizing for the data you don’t have is hard, the statistical learning theory offers both theoretical understanding and practical solutions to this problem. The central idea consists of introducing the notion of structural risk minimization which balances the empirical error.

This will be discussed in a later post, stay tuned.

(Shameless plug) Many of those modern solutions, i.e. mathematical models that happen to be careful about the overfitting issue, have been implemented by Lokad, so that you don’t have to hire a team of experts to benefit from them.

Categories: business, forecasting, insights, retail, sales, supply chain Tags: forecasting insights promotion retail theory 1 Comment