Filtering by Tag: technology

Text mining for better demand forecasts

Published on by Joannes Vermorel.

We are proud to announce that Lokad is now featuring text mining capabilities that assist its forecasting engine in delivering accurate demand forecasts, even when looking at products associated with sparse and intermittent demand that do not benefit from attributes such as categories and hierarchies. This feature is live, check out the label option of our forecasting engine.

The primary forecasting challenge faced by supply chains is the sparsity of the data: most products don’t have a decade’s worth of relevant historical data and aren’t served by thousands of units when considering the edges of the supply chain network. Traditional forecasting methods, which rely on the assumption that the time series are both long and non-sparse, perform poorly for this very reason.

Lokad is looking at supply chain historical data from another angle: instead of looking at the depth of the data, which tends to be nonexistent, we are looking at the width of the data, that is, all the correlations that exist between the products. As there are frequently thousands of products, many correlations can be leveraged to improve the forecasting accuracy significantly. Yet, when establishing those correlations, we cannot count on relying on the demand history because many products, such as the products that are about to be launched, don’t even have historical data yet. Thus, the forecasting engine of Lokad has introduced a mechanism to leverage categories and hierarchies instead.

Leveraging categories and hierarchies for increased forecasting accuracy works great. However, this approach suffers from one specific limitation: it relies on the availability of categories and hierarchies. Indeed, many companies haven’t invested much in master data setups, and, as a result, cannot benefit from much fine-grained information about the products that flow through the supply chain. Previously, when no category and no hierarchy were available, our forecasting engine was essentially crippled in its capability to cope with sparse and intermittent demand.

The new text mining capabilities of the Lokad forecasting engine is a game changer: the engine is now capable of processing the plain-text description of products to establish the correlations between the products. In practice, we observe that while companies may be lacking proper categorizations for their products, a plain-text description of the products is nearly always available, dramatically improving the applicability of the width-first forecasting perspective of Lokad.

For example, if a diverse set of products happens to be named Something Christmas, and all those products exhibit a consistent seasonal spike before Christmas, then the forecasting engine can identify this pattern and automatically apply the inferred seasonality to a new product that has the keyword Christmas in its description. This is exactly what happens under the hood at Lokad when plain-text labels are fed to the forecasting engine.

Our example above is simplistic, but, in practice, text mining involves uncovering complex relationships between words and demand patterns that can be observed in the historical data. Products sharing similar descriptions may share similar trends, similar life-cycles, similar seasonalities. However, two products with similar descriptions may share the same trend but not the same seasonality, etc. The forecasting engine of Lokad is based on machine learning algorithms that automatically identify the relevant information from the plain-text descriptions of the products. The engine requires no preprocessing of the product descriptions.

Our motto is to make the most of the data you have. With text mining capabilities, we are once again lowering the requirements to bring your company to age of quantitative supply chains. Any question? Just drop us a line at

Categories: Tags: forecasting insights technology release No Comments

Ionic data storage for high scalability in supply chain

Published on by Joannes Vermorel.

Supply chains moved quite early on towards computer-based management systems. Yet, as a result, many large companies have decade-old supply chain systems which tend to be sluggish when it comes to crunching a lot of data. Certainly, tons of Big Data technologies are available nowadays, but companies are treading carefully. Many, if not most, of those Big Data companies are critically dependent on top-notch engineering talent to get their technologies working smoothly; and not all companies succeed, unlike Facebook, in rewriting layers of Big Data technologies for making them work.

Being able to process vast amounts of data has been a long-standing commitment of Lokad. Indeed, optimizing a whole supply chain typically requires hundreds of incremental adjustments. As hypotheses get refined, it’s typically the entire chain of calculations that needs to be re-executed. Getting results that encompass the whole supply chain network in minutes rather than hours lets you complete a project in a few weeks while it would have dragged on for a year otherwise.

And this is why we started our migration towards cloud computing back in 2009. However, merely running on top of a cloud computing platform does not guarantee that vast amount of data can be processed swiftly. Worse still, while using many machines offers the possibility to process more data, it also tends to make data processing slower, not faster. In fact, delays tend to take place when data is moved around from one machine to the next, and also when machines need to coordinate their work.

As a result, merely throwing more machines at a data processing problem does not reduce any further the data processing time. The algorithms need to be made smarter, and every single machine should be able to do more with no more computing resources.

A few weeks ago, we have released a new high-performance column storage format code-named Ionic thatis heavily optimized for high-speed concurrent data processing. This format is also geared towards supply chain optimization as it natively supports the handling of storage distributions of probabilities. And these distributions are critical in order to be able to take advantage of probabilistic forecasts. Ionic is not intended to be used as an exchange format between Lokad and its clients. For data exchange, using flat text file format, such as CSV, is just fine. The Ionic format is intended to be used as internal data format to speed-up everything that happens within Lokad. Thanks to Ionic, Lokad can now process hundreds of gigabytes worth of input data with relative ease.

In particular, the columnar aspect of the Ionic format ensures that columns can be loaded and processed separately. When addressing supply chain problems, we are routinely facing ERP extractions where tables have over 100 columns, and up to 500 columns for the worst offenders. Ionic delivers a massive performance boost when it comes to dealing with that many columns.

From Lokad’s perspective, we are increasingly perceiving data processing capabilities as a critical success factor in the implementation of supply chain optimization projects. Longer processing time means that less gets done every single day, which is problematic since ultimately every company operates under tight deadlines.

The Ionic storage format is one more step into our Big Data journey.

Categories: Tags: technology release supply chain cloud computing bigdata No Comments

WinZip and 7z file formats now supported

Published on by Joannes Vermorel.

File formats are staggeringly diverse. At Lokad, our ambition is to support all the (reasonable) tabular file formats. We were already supporting CSV (comma-separated values) files with all their variants - which can involve varying separators or varying line returns.

However, tabular files can become very large, and in order to make the file transfer to Lokad faster, these files can be compressed. Lossless compression of flat text files works very well, frequently yielding a compression ratio below 10%, i.e. the resulting compressed file is less than 10% of the original file.

Then again, compression formats are staggeringly diverse as well. So far, we were only supporting the venerable and ubiquitous GZip - the compression format used to compress web pages for example.

The two formats WinZip - famous for its .zip file extension - and 7z - one of the most efficient compression algorithms available on the market - are now supported by Lokad. In both cases, the file formats are archive formats, hence, a single .zip file can contain many files within the archive. For now, Lokad only supports single-file archives.

This choice makes sense in practice because if the flat file is so large that it requires compression in the first place, producing an even bigger archive gathering multiple large files tends to be impractical. Instead, we suggest to use incremental file uploads.

Check out our documentation about how to read files in Envision.

Categories: Tags: release technology No Comments

Full automation ahead

Published on by Noora Kekkonen.

Lokad uses advanced forecasting methods in order to produce the most accurate forecasts possible, and while that accuracy is greater than with classic methods, many large reports can’t be computed instantly in real time. Executing multiple operations in a specific order and retrieving data from other apps can sometimes be time consuming, therefore, Lokad now provides an automation feature which allows the full control of all the operations needed to produce the numbers your company needs.

From simple scheduling to fully controlled sequences

Since being able to schedule operations is a must-have feature in advanced analytics, Lokad already provided this option in the project configuration, but it was quite limited and required an account on a third party scheduling service. Therefore, we have now launched a native automation feature which offers both orchestration and scheduling possibilities.

Lokad project orchestration showing multiple projects scheduled to run at 0330 UTC daily. The first step (data import) will be skipped if it has been run within the past 6 hours, and if the third step fails the sequence will continue.

Lokad project orchestration showing multiple projects scheduled to run at 0330 UTC daily. The first step (data import) will be skipped if it has been run within the past 6 hours, and if the third step fails the sequence will continue.

Orchestration and scheduling - the two pillars of advanced analytics

With the new automation feature, you can define a specific order for running projects. In this way, the updated data from previous runs can be applied to other projects, as the run will only start when the previous one is completed. The “skip if more recent” option is useful when dealing with long processes. For example, you can set the sequence to auto-skip one or more steps if they have already been run in the last 12 hours.

Scheduling operations allows you to have your reports ready when your company needs them - whether it is on a daily or weekly basis. Some operations require a large amount of data and their execution can take a while. Therefore, Lokad also allows you to set a specific time to start running the sequences. We particularly suggest running projects during the night. In this way you will always have your numbers ready in the morning, without waiting.

Categories: Tags: release technology No Comments

Width vs. Depth, Rotate your sales forecasts by 90 degrees

Published on by Joannes Vermorel.

We have already discussed why Lokad did not care much about forecasting Chinese food rather than Sport Bar beverages. Another way of thinking our technology consists of rotating your sales forecasts by 90 degrees.

We are observing that a consumer product has, on average, 3 years lifecycle. This means that on average the amount of data available for every single product about 18 months. When, we look at the sales history with a monthly aggregation, 18 months of data means 18 points.

With 18 data points, no matter how smart or advanced is your forecasting theory, you can't do much simply because we face an utter lack of data to perform any robust statistical analysis. With 18 points, even a pattern has obviously as seasonality becomes a challenge to observe because we don't even have 2 complete seasonal observation.

Your mileage may vary from one industry to the next, but unless your products stay in the market for decades, you are most likely to face this issue.

As a direct consequence, classical forecasting toolkits require statisticians to tweak forecasting models for every single product because no non-trivial statistical model can be robustly fit with only 18 points as input data.

Yet, Lokad does not require any statistician, and the magic lies in the 90 degrees rotation: our models do not iterate over data a single time-series at a time, but against all time-series at once. Thus, we have a lot more input data available, and consequently we can succeed with rather advance models.

This approach is just common sense: if you want to forecast the seasonality of your new chocolate bar, the seasonality of the other chocolate bars seems like a good candidate. Why should you treat each chocolate bar in strict isolation from the others?

Yet, from a computational perspective, the problem has just become a lot harder: if you have 10,000 SKUs the number of associations between two SKUs is roughly 100 millions (and 10,000 SKU is nowhere a large number). That's precisely where the cloud kicks in: even if your algorithms are well-designed not to suffer a strict quadratic complexity, you're still going to need a lot of processing power. The cloud just happens to make this processing power available on demand at a very low price.

Without the cloud, it is simply not possible to deliver this kind of technology.

Categories: forecasting, insights Tags: cloud computing depth forecasting insights statistics technology width No Comments

Internet is needed for your forecasts

Published on by Joannes Vermorel.

Ethernet cable illustration Do I really need an Internet connection to get your forecasts? is a question frequently asked by prospects having a look at our forecasting technology.

Well, the answer is YES. With Lokad, there is no work-around. Our forecasting engine does not come as an on-premises solution.

But why should we need an internet connection for an algorithmic processing such as forecasting?

The answer to this question is one of the core reason that have lead to the very existence of Lokad in the first place.

When we started working on the Lokad project - back in 2006 -  we quickly realized that forecasting, despite appearances, was a total misfit for local processing.

1. Your can't get your forecasts right without having the data at hand. Researchers have been looking for decades for a universal forecasting model, but the consensus among the community is that there is no free lunch; universal models do not exist, or rather, they tend to perform poorly. This is the primary reasons why forecasting toolkits feature so many models (don't click this link, it's 3000 pages manual for a popular toolkit). With Lokad, the process is much simpler because the data is made available to Lokad. Hence, it does not matter any more if thousands of parameters are needed, as parameters are handled by Lokad directly.

2. Advanced forecasting is quite resource intensive but the need to forecast is only intermittent. Even a small retailer with 10 point of sales and 10k product references represents already 100k time-series to be forecasted. If we consider a typical performance of 10k/series per hour for a single CPU (which is already quite optimistic for complex models), then computing sales forecasts for the 10 points of sales take a total 10h of CPU time. Obviously, retailers prefer not to wait for 10h to get their forecasts. Buying an amazingly powerful workstation is possible, but then does it make sense to have so much processing power staying idle 99% of the time when forecasts are made only once a week? Outsourcing the processing power is the obvious cost-effective approach here.

3. Forecasting is still under fast paced evolution. Since our launch about 3 years ago, Lokad has been upgraded every month or so. Our forecasting technology is not some indisputable achievement carved in stone, but on the contrary, is still undergoing a rapid evolution. Every month, the statistical learning research community moves forward with loads of fresh ideas.  In such context, on-premise solutions undergo a rapid decay until the day the discrepancy between the performance of current version and the performance of the deployed version is so great that the company has no choice but to rush an upgrade. Aggressively developed SaaS ensure that customers benefit from the latest improvements without having to even worry about it.

In our opinion, going for an on-premise solution for your forecasts is like entering a golf competition with a large handicap. It might make the game more interesting, but it does not maximize your chances. Don't expect your competitors to be fair enough to start with the same handicap just because you do.

Categories: business, forecasting, insights Tags: business forecasting insight technology No Comments

What's your statistical model?

Published on by Joannes Vermorel.

We have already disclosed a few insights about what's being used at Lokad. Yet, a frequent support request remains what's your model, precisely?

We‘re looking through various forecasting statistical packages with the intent on selecting one at some point in the near future. One thing I find lacking in Lokad is to see which statistical model was used. I understand that the selection of which model is used is a trade secret, but I would like to verify the final selection, in the trial that is, with our in-house mathematician before we trust you with our actual forecasts. Most software vendors operating in this space provide the model selected. Is it possible to get that result with Lokad?

Well, unfortunately, the correct answer is that Lokad isn't a statistical package. In particular, we don't deliver models, we deliver forecasts.

The whole architecture of Lokad has been designed around this very assumption, which unfortunately is very ill-suited to deliver any information about our models.

Our forecast flow, which grabs input data and outputs forecasts, is:

  • vastly more complex compared to models shipped with statistical packages. Forecasts cannot be associated with well-known models.
  • tailored for distributed computing in the clouds, thus, the design feels very alien when compared to classic toolkits.
  • subject to ongoing changes, as we are carrying experiments on a daily basis with agile deployment strategies.

But this design has very specific benefits too:

  • no need to tune complex forecasting parameters.
  • no need to constantly watch your parameters, we monitor the results.
  • scales up as much as you need to, up to millions of forecasts.
  • handles complex patterns that are way beyond classical toolkits.

Then, we don't ask anyone to take our results for granted. Just go and see for yourself, our trial is free for 30 days.

Categories: forecasting, insights Tags: forecasting technology 2 Comments

Forecasting in the clouds and Lokad.Cloud

Published on by Joannes Vermorel.

Cloud computing will be the no1 buzzword in software industry in 2009. Among forums, blogs, even traditional news papers, cloud computing is the new rage; and we, at Lokad, are no exception.

Yet, for us, cloud computing is not a buzzword, it's a very real technology addressing a very critical aspect of our technology: scalability.

In short, delivering forecasts is a rather bumpy process: for one week, we wait, and then, suddenly, a customer sends us a (very) large amount and (rightfully) expects forecasts to be delivered within 1h.

Traditional computing infrastructures do not deal efficiently with those sorts of needs: servers are rented for at least one month with strong pricing incentive toward longer engagements. For Lokad, traditional server hosting means that our processing power is vastly underused; yet during peaks, there is never enough processing power available.

Thus, we have started migrating toward the cloud, and more specifically toward Windows Azure (special thanks to Steve Marx and Yi-Lun Luo from Microsoft for their assistance to get us started with Azure).

For those who rely on us, cloud computing means that we will be able to serve you better through:

  • unrivaled and unlimited scalability: no matter how large your data, we will address your needs, on demand, real time.
  • better forecasts through more complex statistical models that are presently too expensive CPU-wise to be put in production.

Lokad.Cloud big picture

Although, cloud computing is still a rough field lacking many commodities usually taken for granted by developers when dealing with non-cloud apps. This is why we have started a new open source project named Lokad.Cloud that provides a .NET framework to speedup the development of back-office apps built on top of Windows Azure. We expect an alpha release in July. Stay tuned.

Categories: developers, open source, roadmap, technical Tags: cloud computing opensource scalability technology No Comments

Machine learning company, what’s so special?

Published on by Joannes Vermorel.

Machine learning is the subfield of artificial intelligence that is concerned with the design and development of algorithms that allow computers to improve their performance over time based on data, such as from sensor data or databases. Wikipedia.

Ten years ago, machine learning companies were virtually non-existent, or say, marginal at most. The main reason for that situation was simply that there weren’t that many algorithms actually working and delivering business value at the time. Automated translation, for example, is still barely working, and very far from being usable in most businesses.

Lokad fits into the broad machine learning field, with a specific interest for statistical learning. Personally, I have been working in the machine learning field for now almost a decade, and it’s still surprising to see how things are deeply different is this field compared to the typical shrinkwrap software world. Machine learning is a software world of its own.

Scientific progress on areas that looked like artificial intelligence has been slow, very slow compared to most other software areas. But a fact that is too little known is also that scientific progress has been steady; and, at present day, there are quite a few successful machine learning companies around:

  • Smart spam filter: damn, akismet caught more than 71 000 blog spam comments on my blog, with virtually zero false positive as far I can tell.
  • Voice recognition: Dragon Dictate is now doing quite an impressive job just after a few minutes of user tuning.
  • Handwriting recognition and even equation recognition are built in Windows 7.

Machine learning has become mainstream.

1. Product changes but user interface stays

For most software businesses, bringing something new to the customer eyes is THE way to get recurrent revenues. SaaS is slowly changing this financial aspect, but still, for most SaaS products, evolution comes with very tangible changes on the user interface.

On the contrary, in machine learning, development usually doesn’t mean adding any new feature. Most of the evolution happens deep inside with very little or no surfacing changes. Google Search - probably the most successful of all machine learning products - is notoriously simple, and has been that way for a decade now. Lately, ranking customization based on user preferences has been added, but this change occurred almost 10 years after the launch, and I would guess, is still unnoticed by most users.

Yet, it doesn't mean that Google folks have been staying idle for the last 10 years. Quite the opposite actually, Google teams have been furiously improving their technology winning battle after battle against web spammers who are now using very clever tricks.

2. Ten orders of magnitude in performance

When, it comes to software performance, usual shrinkwrap operations happen within 100ms. For example, I suspect that usual computation times, server side, needed to generate a web page application are ranging from 5ms for the most optimized apps to 500ms for the slowest ones. Be slower than that, and your users will give up on visiting your website. Although, it’s hardly verifiable, I would suspect this performance range holds true for 99% of the web applications.

But it comes to machine learning, typical computational costs are varying for more than 10 orders of magnitude, from milliseconds to weeks.

At present day, the price of 1 month of CPU at 2Ghz has dropped to $10, and I expect this price drop under $1 in the next 5 years. Also, one month of CPU can be compressed within a few hours of wall time through large scale parallelization. For most machine learning algorithms, accuracy can be improved by dedicating more CPU to the task at hand.

Thus, gaining 1% in accuracy with a 1 month CPU investment ($10) can be massively profitable, but that sort of reasoning is just plain insanity for most, if not all, software areas outside machine learning.

3. Hard core scalability challenges

Scaling-up a Web2.0 such as say Twitter is a challenge indeed, but, in the end, 90% of the solution lies into a single technique: in-memory caching of the most frequently viewed items.

On the contrary, scaling up machine learning algorithms is usually a terrifyingly complicated task. It took Google several years to manage to perform large scale sparse matrix diagonalization; and linear algebra is clearly not the most challenging area of mathematics when it comes to machine learning problems.

The core problem of machine learning is that the most efficient way to improve your accuracy consists in adding more input data. For example, if you want to improve the accuracy of your spam filter, you can try to improve your algorithm, but you can also use a larger input database where emails are already flagged as spam or not spam. Actually, as long as you have enough processing power, it’s frequently way easier to improve your accuracy through larger input data than through smarter algorithms.

Yet, handling large amount of data in machine learning is a complicated problem because you can’t naively partition your data. Naïve partitioning is equivalent of discarding input data and of performing local computations that are not leveraging all the data available. Bottom line: machine learning needs very clever ways of distributing its algorithms.

4. User feedback is usually plain wrong

Smart people advise to do hallway usability testing. This also apply to whatever user interface you put on your machine learning product, but when it comes to improve the core of your technology, user feedback is virtually useless when not simply harmful if actually implemented.

The main issue is that, in machine learning, most good / correct / expected behaviors are unfortunately counter intuitive. For example, at Lokad, a frequent customer’s complain is that we deliver flat forecasts which are perceived as incorrect. Yet, those flat forecasts are just in the best interest of those customers, because they happen to be more accurate.

Although being knowledgeable about spam filtering, I am pretty sure that 99% of the suggestions that I come up with and send to the akismet folks would be just junk to them, simply because the challenge in spam filtering is not how do I filter spam, but how do I filter spam, without filtering legit emails. And yes, the folks at Pfizer have the right to discuss by email of Sildenafil citrate compounds without having all their emails filtered.

5. But user data holds the truth

Mock data and scenarios mostly make no sense in machine learning. Real data happens to be surprising in many unexpected ways. Working in this field for 10 years now, and each new dataset that I have ever investigated has been surprising in many ways. It’s completely useless to work on your own made-up data. Without real customer data at hand, you can’t do anything in machine learning.

This particular aspect frequently leads to chicken-egg problem in machine learning: if you want to start optimizing contextual ads display, you need loads of advertisers and publishers. Yet, without loads of advertisers and publishers, you can’t refine your technology and consequently, you can’t convince loads of advertisers and publishers to join.

6. Tuning vs. Mathematics, Evolution vs. Revolution

Smart people advise that rewriting from scratch is the type of strategic mistake that frequently kills software companies. Yet, in machine learning, rewriting from scratch is frequently the only way to save your company.

Somewhere at the end of nineties, Altavista, the leading search engine, did not took the time to rewrite their ranking technology based on the crazy mathematical ideas based on large scale diagonalization. As a result, they got overwhelmed by a small company lead by a bunch inexperienced people.

Tuning and incremental improvement is the heart of classical software engineering, and it’s also hold true for machine learning - most of the time. Gaining the next percent of accuracy is frequently achieved by finely tuning and refining an existing algorithm, designing tons of ad-hoc reporting mechanisms in the process to get deeper insights in the algorithm behavior.

Yet, each new percent of accuracy that way costs you tenfold as much of efforts than the previous one; and after a couple of months or years your technology is just stuck in a dead-end.

That’s where hard core mathematics come into play. Mathematics is critical to jump on the next stage of performance, the kind of jump were you make a 10% improvement which seemed not even possible with the previous approach. Then, trying new theories is like playing roulette: most of the time, you lose, and the new theory is not bringing any additional improvements.

In the end, making progress in machine learning means very frequently trying approaches that are doomed to fail with a high probability. But once in a while something actually happens to work and the technology leaps forward.

Categories: developers, insights, technical Tags: business insight machine learning software technology 1 Comment