Filtering by Tag: envision

Entropy analysis for supply chain IT system discovery

Published on by Joannes Vermorel.

The IT landscape of supply chains is nearly always complex. Indeed, by nature, supply chains involve multiple actors, multiple sites, multiple systems, etc. As a result, building data-driven insights in supply chains is a challenge due to the sheer heterogeneity of the IT landscape. Too frequently, supply chain analytics deliver nonsensical results precisely because of underlying garbage in, garbage out problems.

At Lokad, we have not only developed a practice that thoroughly surveys the IT landscape and the datasets inhabiting it, but we have also created some bits of technology to facilitate the surveying operations themselves. In this post, we detail one step of our surveying methodology, which is based on Shannon entropy. We successfully leveraged entropy analysis for several large scale supply chain initiatives.

Our surveying process starts by reviewing all the database tables that are understood to be relevant for the supply chain initiative. Supply chains are complex and, consequently, the IT systems that operate the supply chains reflect this complexity. Moreover, the IT systems might have been evolving for several decades, and layers of complexity tend to take over those systems. As a result, it’s not uncommon to identify dozens of database tables with each table having dozens of columns, i.e. fields in the database terminology.

For large supply chains, we have observed situations where the total number of distinct fields are above 10,000. By leveraging entropy analysis, we are able to remove half of the columns from the picture immediately and consequently reduce the remaining amount of work significantly.

Dealing with that many columns is a major undertaking. The problem is not data-processing capabilities: with cloud computing and adequate data storage, it’s relatively straightforward to process thousands of columns. The real challenge is to make sense of all those fields. As a rule of thumb, we estimate that well-written documentation of a field takes up about one page, when interesting use cases and edge cases are covered. Without proper documentation of the data, data semantics are lost, and odds are very high that any complex analysis performed on the data will suffer from massive garbage in garbage out headaches. Thus, with 10,000 fields, we are faced with the production of a 10,000-page manual, which demands a really monumental effort.

Yet, we have observed that those large IT systems also carry a massive amount of dead weight. While the raw number of fields appear to very high, in practice, it does not mean that every column found in the system contains meaningful data. At the extreme, the column might be entirely empty or constant and, thus, contain no information whatsoever. A few fields can be immediately discarded because they are truly empty. However, we have observed that fully empty fields are actually pretty rare. Sometimes, the only non-constant information in the column dates from the day when the system was turned on; the field was never to be reused afterward. While truly empty fields are relatively rare, we usually observe that degenerate fields are extremely numerous. These fields contain columns with almost no data, well below any reasonable threshold to leverage this data for production purposes.

For example, a PurchaseOrders table containing over one million rows might have an Incoterms column that is non-empty in only 100 rows; furthermore, all those rows are more than five years old, and 90 rows contain the enry thisisatest. In this case, the field Incoterms is clearly degenerate, and there is no point in even trying to make sense of this data. Yet, a naive SQL filter will fail to identify such a column as degenerate.

Thus, a tool to identify degenerate columns is needed. It turns out that Shannon entropy is an excellent candidate. Shannon entropy is a mathematical tool to measure the quantity of information that is contained in a message. The entropy is measured in Shannons, which is a unit of measurement somewhat akin to bits of information. By treating the values found in the column itself as the message, Shannon entropy gives us a measure of the information contained in the column expressed in Shannons.

While all of this might sound highly theoretical, putting this insight into practice is extremely straightforward. All it takes is to use the entropy() aggregator provided by Envision. The tiny script below illustrates how we can use Envision to produce the entropy analysis of a table with 3 fields.

read "data.csv" as T[*]
show table "List of entropies" with
  entropy(T.Field1)
  entropy(T.Field2)
  entropy(T.Field3)


Any field associated with an entropy lower than 0.1 is a very good indicator of a degenerate column. If the entropy is lower than 0.01, the column is guaranteed to be degenerate.

Our experience indicates that performing an initial filtering based on entropy measurements reliably eliminates between one-third and two-thirds of the initial fields from the scope of interest. The savings in time and efforts are very substantial: for large supply chain projects, we are talking about man-years being saved through this analysis.

We unintentionally discovered a positive side effect of entropy filtering: it lowers the IT fatigue associated with the (re)discovery of the IT systems. Indeed, investigating a degenerate field typically proves to be an exhausting task. As the field is not used - sometimes not used any more - nobody is quite sure whether the field is truly degenerate or if the field is playing a critical, but obscure, role in the supply chain processes. Because of the complexities of supply chains, there is frequently nobody who can positively affirm that a given field is not used. Entropy filtering immediately eliminates the worst offenders that are guaranteed to lead us on a wild-goose chase.

Categories: Tags: supply chain envision insights No Comments

Markdown tile and Summary tile

Published on by Joannes Vermorel.

The dashboards produced by Lokad are composite: they are built of tiles that can be rearranged as you see fit. We have many different tiles available: linechart, barchart, piechart, table, histogram, etc. This tile approach offers great flexibility when it comes to crafting a dashboard that contains the exact figures your company needs. Recently, we have introduced two extra tiles in order to help fine-tune your dashboards even further.


The Summary tile offers a compact approach for displaying KPIs (key performance indicators). While it was already possible to use the Table tile for a similar purpose, this approach was requiring 1 tile to be introduced for every KPI. As a result, dashboards containing a dozen or more KPIs were needlessly large. In contrast, the Summary tile offers a more practical way for gathering a couple of key figures in one place. As usual, the real challenge is not to present thousands of numbers to the supply chain practitioner - that part is easy - but rather to present the 10 numbers that are worth being read - and that part is hard; and the Summary tile happens to be the best tile to gather those 10 numbers.

The Markdown tile offers the possibility to display simply formatted text in the dashboard. As the name suggests, the text gets formatted using the Markdown syntax, which is rather straightforward. One of the most urgent needs addressed by the Markdown tile is the possibility to embed detailed legends within dashboards. Indeed, when composing complex tables, such as suggested purchase quantities, it is important to make sure there is no remaining ambiguity concerning the semantic of each table column. The Markdown tile represents a practical way of delivering contextual documentation and making sure that no numbers get misinterpreted. It also provides an opportunity to document the intent behind the numbers which is too frequently lost amid technicalities: the documentation can outline as to why a number is shown on the dashboard in the first place.

Categories: Tags: envision release No Comments

Working with uncertain futures

Published on by Joannes Vermorel.

The future is uncertain. Yet, nearly all predictive supply chain solutions make the opposite assumption: they assume that their forecasts are correct, and hence roll out their simulations based on those forecasts. Implicitly, the future is assumed to be certain and complications ensue.

From a historical perspective, software engineers were not making those assumptions without a reason: a deterministic future was the only option that the early - and not so early - computers could process at best. Thus, while dealing with an uncertain future was known to be the best approach in theory, in practice, it was not even an option.

In addition, a few mathematical tricks were found early in the 20th century in order to circumvent this problem. For example, the classic safety stock analysis assumes that both the lead times and the demand follow a normal distribution pattern. The normal distribution assumption is convenient from a computing viewpoint because all it takes is two variables to model the future: the mean and the variance.

Yet again, the normal distribution assumption - both for the lead times and the demand - proved to be incorrect in nearly all but a few situations, and complications ensued.

Back in 2012 at Lokad, we realized that the classic inventory forecasting approach was simply not working: mean or median forecasts were not addressing the right problem. No matter how much technology we poured on the case, it was not going to work satisfyingly.

Thus, we shifted to quantile forecasts, which can be interpreted as forecasting the future with an intended bias. Soon we realized that quantiles were invariably superior to the classic safety stock analysis, if only because quantiles were zooming in on where it really mattered from a supply chain perspective.

However, while going quantile, we realized that we had lost quite a few things in the process. Indeed, unlike classic mean forecasts, quantile forecasts are not additive, so it was not possible to make sense of a sum of those quantiles for example. In practice, the loss wasn’t too great because since classic forecasts weren’t making much sense in the first place, summing them up wasn’t a reasonable option anyway.

Over the years, while working with quantiles, we realized that so many of the things we took for granted had become a lot more complicated: demand quantities could no longer be summed or subtracted or linearly adjusted. In short, while moving towards an uncertain future, we had lost the tools to operate on this uncertain future.

Back in 2015, we introduced quantile grids. While quantile grids were not exactly the same as our full-fledged probabilistic forecasts just yet, our forecasting engine was already starting to deliver probabilities instead of quantile estimates. Distributions of probabilities are much more expressive than simple quantile estimates, and, it turns out that it is possible to define an algebra over distributions.

While the term algebra might sound technical, it’s not that complicated; it means that a simple operation such as the sum, the product, the difference, …, can be defined in ways which are not only mathematically consistent, but also highly relevant from the supply chain perspective.

As a result, just a few weeks ago, we integrated an algebra of distributions right into Envision, our domain-specific language dedicated to commerce optimization. Thanks to this algebra of distributions, it becomes straightforward to carry outseemingly simple operations such as summing two uncertain lead times (say an uncertain production lead time plus an uncertain transport lead time). The sum of those two lead times is carried out through an operation known as a convolution. While the calculation itself is fairly technical, in Envision, all it takes is to write A = B +* C, where +* is the convolution operator used to sum up independent random variables (*).

Through this algebra of distributions most of the “intuitive” operations which were possible with classic forecasts are back : random variables can be summed, multiplied, stretched, exponentiated, etc. And while relatively complex calculations are taking place behind the scenes, probabilistic formulas are not more complicated than plain Excel formulas from the Envision perspective.

Instead of wishing for the forecasts to be perfectly accurate, this algebra of distributions lets us embrace uncertain futures: supplier lead times tend to vary, quantities delivered may differ from quantities ordered, customer demand changes, products get returned, inventory may get lost or damaged … Through this algebra of distributions it becomes much more straightforward to model most of those uncertain events with minimal coding efforts.

Under the hood, processing distributions is quite intensive; and once again, we would never have ventured into those territories without a cloud computing platform that handles this type of workload - Microsoft Azure in our case. Nevertheless, computing resources have never been cheaper, and your company’s next $100k purchase order is probably well worth spending a few CPU hours - costing less than $1 and executed in just a few minutes - to make sure that the ordered quantities are sound.

(*) A random variable is a distribution that has a mass of 1. It’s a special type of distribution. Envision can process distributions of probabilities (aka random variables), but more general distributions as well.

Categories: Tags: insights forecasting envision No Comments

Autocomplete file paths with Envision

Published on by Joannes Vermorel.

When data scientists work with Envision, our domain-specific language tailored for quantitative optimization for commerce, we want to ensure that they are as productive as possible. Indeed, data scientists don't grow on trees, and when you happen to have one available, you want to make the most of his time.

A data analysis begins by loading input data, which happens to be stored as flat files within Lokad. Therefore, an Envision script always starts with a few statements such as:

read "/sample/Lokad_Items.tsv"
read "/sample/Lokad_Orders.tsv" as Orders
read "/sample/Lokad_PurchaseOrders.tsv" as PurchaseOrders

While Envision syntax is compact and straightforward, file names may, on the other hand, be fairly complex. From the beginning, our source code editor had been released with autocompletion, however until recently, autocompletion was not providing suggestions for file names. A few days ago, the code editor was upgraded, and file names are now suggested as follows:

This feature was part of a larger upgrade which also made the Envision code source editor more responsive and more suitable for dealing with large scripts.

Categories: Tags: envision release features No Comments

Joining tables with Envision

Published on by Joannes Vermorel.

When it comes to supply chain optimization, it’s important to accommodate the challenges while minimizing the amount of reality distortion that get introduced in the process. The tools should embrace the challenge as it stands instead of distorting the challenge to make it fit within the tools.

Two years ago, we introduced Envision, a domain-specific language, precisely intended as a way to accommodate the incredibly diverse range of situations found in supply chain. From day 1, Envision was offering a programmatic expressiveness which was a significant step forward compared to traditional supply chain tools. However, this flexibility was still limited by the actual viewpoint taken by Envision itself on the supply chain data.

A few months ago, we have introduced a generic JOIN mechanism in Envision. Envision is no more limited by natural joins as it was initially, and offers the possibility to process with a much broader range of tabular data. In supply chain, arbitrary table joins are particularly useful to accommodate complex scenarios such as multi-sourcing, one-way compatibilities, multi-channels, etc.

For the readers who may be familiar with SQL already, joining tables feels like a rather elementary operation; however, in SQL, combining complex numeric calculation with table joins rapidly end up with source code that looks obscure and verbose. Moreover, joining large tables also raises quite a few performance issues which need to be carefully addressed either by adjusting the SQL queries themselves, or by adjusting the database itself throught the introduction of table indexes.

One of the key design goals for Envision was to give up on some of the capabilities of SQL in exchange of a much lower coding overhead when facing supply chain optimization challenges. As a result, the initial Envision was solely based on natural joins, which removed almost entirely the coding overhead associated to JOIN operations, as it is usually done in SQL.

Natural joins have their limits however, and we lifted those limits by introducing the left-by syntax within Envision. Through left-by statements, it becomes possible to join arbitrary tables within Envision. Under the hood, Envision takes care of creating optimized indexes to keep the calculations fast even when dealing with giganormous data files.

From a pure syntax perspective, the left-by is a minor addition to the Envision language, however, from a supply chain perspective, this one feature did significantly improve the capacity of Lokad to accommodate the most complex situations.

If don’t have a data scientist in-house that happens to be a supply chain expert too, we do. Lokad can provides an end-to-end service where we take care of implementing your supply chain solution.

Categories: Tags: envision technical release No Comments

Solving the general MOQ problem

Published on by Joannes Vermorel.

Minimal Order Quantities (MOQs) are ubiquitous in supply chain. At a fundamental level, MOQs represent a simple way for the supplier to indicate that there are savings to be made when products are ordered in batches rather than being ordered unit by unit. From the buyer's perspective, however, dealing with MOQs is far from being a trivial matter. The goal is not merely to satisfy the MOQs - which is easy, just order more - but to satisfy the MOQs while maximizing the ROI.

Lokad has been dealing with MOQs for years already. Yet, so far, we were using numerical heuristics implemented through Envision whenever MOQs were involved. Unfortunately, those heuristics were somewhat tedious to implement repeatedly, and the results we were obtaining were not always as good as we wanted them to be - albeit already a lot better than their "manual" counterparts.

Thus, we finally decided to roll our own non-linear solver for the general MOQ problem. This solver can be accessed through a function named moqsolv in Envision. Solving the general MOQ problem is hard - really hard, and under the hood, it's a fairly complex piece of software that operates. However, through this solver, Lokad now offers a simple and uniform way to deal with all types of MOQs commonly found in commerce or manufacturing.

Categories: Tags: insights supply chain envision No Comments

The Stock Reward Function

Published on by Joannes Vermorel.

The classic way of thinking about replenishment consists of establishing one target quantity per SKU. This target quantity typically takes the form of a reorder point which is dynamically adjusted based on the demand forecast for the SKU. However, over the years at Lokad, we have realized that this approach was very weak in practice, no matter how good the (classic) forecasts.

Savvy supply chain practitioners usually tend to outperform this (classic) approach with a simple trick: instead at looking at SKUs in isolation, they would step back and look at the bigger picture, while taking into consideration the fact that all SKUs compete for the same budget. Then, practitioners would choose the SKUs that seem to be the most pressing. This approach outperforms the usual reorder point method because, unlike the latter, it gives priority to certain replenishments. And as any business manager would know, even very basic task prioritization is better that no prioritization at all.

In order to reproduce this nice “trick”, in early 2015 we upgraded Lokad towards a more powerful form of ordering policy known as prioritized ordering. This policy precisely adopts the viewpoint that all SKUs compete for the next unit to be bought. Thanks to this policy, we are getting the best of both worlds: advanced statistical forecasts combined with the sort of domain expertise which was unavailable to the software so far.

However, the prioritized ordering policy requires a scoring function to operate. Simply put, this function converts the forecasts plus a set of economic variables into a score value. By assigning a specific score to every SKU and every unit of these SKUs, this scoring function offers the possibility to rank all “atomic” purchase decisions. By atomic, we refer to the purchase of 1 extra unit for 1 SKU. As a result, the scoring function should be as aligned to the business drivers as possible. However, while crafting approximate “rule-of-thumb” scoring functions is reasonably simple, defining a proper scoring function is a non-trivial exercise. Without getting too far into the technicalities, the main challenge lies in the “iterated” aspect of the replenishments where the carrying costs keep accruing charges until units get sold. Calculating 1 step ahead is easy, 2 steps ahead a little harder, and N steps ahead is pretty complicated actually.

Not so long ago, we managed to crack this problem with the stock reward function. This function breaks down the challenges through three economic variables: the per-unit profit margin, the per-unit stock-out cost and the per-unit carrying cost. Through the stock reward function, one can get the actual economic impact broken down into margins, stock-outs and carrying costs.

The stock reward function represents a superior alternative to all the scoring functions we have used so far. Actually, it can even be considered as a mini framework that can be adjusted with small (but quite expressive) set of economic variables in order to best tackle the strategic goals of merchants, manufacturers or wholesalers. We recommend using this function whenever probabilistic forecasts are involved.

Over the course of the coming weeks, we will gradually update all our Envision templates and documentation materials to reflect this new Lokad capability.

Categories: Tags: insights supply chain envision No Comments

Currency exchange rates with Envision

Published on by Joannes Vermorel.

Merchants frequently buy with one currency and sell using another. As online commerce is becoming more and more a global commerce, it's not unusual to encounter merchants who are buying in multiple currencies, and selling in multiple currencies as well. From a business analytics viewpoint, it soon becomes rather complicated to figure out where the margins stand exactly. In particular, margins depend not only on the present currency conversion rates, but also on those that have been in place 6 months ago.

As part of our commerce analytics technology, we have recently introduced a new forex() function that is precisely aimed at taking into account historical currency conversion rates for almost 30 currencies - including all the major ones.

Lokad's built-in dashboards have already been updated to take advantage of this function. Now, when Lokad carries out a gross-margin analysis for example, all the sales orders and purchase orders are converted into a single currency, applying the correct historical rates used at the time the transactions were made.

Categories: Tags: envision No Comments