Filtering by Tag: bigdata

Ionic data storage for high scalability in supply chain

Published on by Joannes Vermorel.

Supply chains moved quite early on towards computer-based management systems. Yet, as a result, many large companies have decade-old supply chain systems which tend to be sluggish when it comes to crunching a lot of data. Certainly, tons of Big Data technologies are available nowadays, but companies are treading carefully. Many, if not most, of those Big Data companies are critically dependent on top-notch engineering talent to get their technologies working smoothly; and not all companies succeed, unlike Facebook, in rewriting layers of Big Data technologies for making them work.

Being able to process vast amounts of data has been a long-standing commitment of Lokad. Indeed, optimizing a whole supply chain typically requires hundreds of incremental adjustments. As hypotheses get refined, it’s typically the entire chain of calculations that needs to be re-executed. Getting results that encompass the whole supply chain network in minutes rather than hours lets you complete a project in a few weeks while it would have dragged on for a year otherwise.

And this is why we started our migration towards cloud computing back in 2009. However, merely running on top of a cloud computing platform does not guarantee that vast amount of data can be processed swiftly. Worse still, while using many machines offers the possibility to process more data, it also tends to make data processing slower, not faster. In fact, delays tend to take place when data is moved around from one machine to the next, and also when machines need to coordinate their work.

As a result, merely throwing more machines at a data processing problem does not reduce any further the data processing time. The algorithms need to be made smarter, and every single machine should be able to do more with no more computing resources.

A few weeks ago, we have released a new high-performance column storage format code-named Ionic thatis heavily optimized for high-speed concurrent data processing. This format is also geared towards supply chain optimization as it natively supports the handling of storage distributions of probabilities. And these distributions are critical in order to be able to take advantage of probabilistic forecasts. Ionic is not intended to be used as an exchange format between Lokad and its clients. For data exchange, using flat text file format, such as CSV, is just fine. The Ionic format is intended to be used as internal data format to speed-up everything that happens within Lokad. Thanks to Ionic, Lokad can now process hundreds of gigabytes worth of input data with relative ease.

In particular, the columnar aspect of the Ionic format ensures that columns can be loaded and processed separately. When addressing supply chain problems, we are routinely facing ERP extractions where tables have over 100 columns, and up to 500 columns for the worst offenders. Ionic delivers a massive performance boost when it comes to dealing with that many columns.

From Lokad’s perspective, we are increasingly perceiving data processing capabilities as a critical success factor in the implementation of supply chain optimization projects. Longer processing time means that less gets done every single day, which is problematic since ultimately every company operates under tight deadlines.

The Ionic storage format is one more step into our Big Data journey.

Categories: Tags: technology release supply chain cloud computing bigdata No Comments

Data qualification is critical

Published on by Joannes Vermorel.

Wikipedia lists seven steps for a data analysis process: data requirements, data collection, data processing, data cleaning, exploratory data analysis, data modeling, and finally the generation of production results. When Lokad forecasts inventory, optimizes prices, or anytime we tackle some kind of commerce optimization, our process is very similar to the one described above. However, there is another one vital step that typically accounts for more than half of all the effort typically applied by Lokad’s team and that is not even part of the list above. This step is the data qualification.

Now that “Big Data” has become a buzzword, myriads of companies are trying to do more with their data. Data qualification is probably the second largest cause of project failures, right after unclear or unwise business goals - which happens anytime an initiative starts from the “solution” rather than starting from the “problem”. Let’s shed some light on this mysterious “data qualification” step.

Data as a by-product of business apps

The vast majority of business software is designed to help operate companies: the Point-Of-Sale system is there to allow clients pay; the Warehouse Management System is there to pick and store products; the Web Conferencing software lets people carry out their meetings online, etc. Such software might be producing data too, but data is only a secondary by-product of the primary purpose of this software.

The systems mentioned are designed to operate the business, and as a result, whenever a practionner has to choose between better operations or better data, better operations will always always be favored. For example, if a barcode fails when being scanned at the point of sale of your local hypermarket, the cashier will invariably choose a product that happens to have the same price and scan it twice; sometimes they even have they cheat sheet of barcodes all gathered on a piece of paper. The cashier is right: the No1 priority is to let the client pay no matter what. Generating accurate stock records is not an immediate goal when compared to the urgent need of servicing a line of clients.

One might argue that the barcode scanning issue is actually a data cleaning issue. However, the situation is quite subtle: records remain accurate to some extent since the amount charged to the client remains correct and so does the count of items in the basket. Naively filtering out all the suspicious records would do more harm than good for most analysis.

Yet, we observe that too often, companies – and their software vendors too – enthusiastically ignore this fundamental pattern for nearly all business data that are generated, jumping straight from data processing to data cleaning.

Data qualification relates to the semantic of the data

The goal of the data qualification step is to clarify and thoroughly document the semantic of the data. Most of the time, when (large) companies send tabular data files to Lokad, they also send us an Excel sheet, where each column found in the files gets a short line of documentation, typically like: Price: the price of the product. However, such a brief documentation line leaves a myriad of questions open:

  • what is the currency applicable for the product?
  • is it a price with or without tax?
  • is there some other variable (like a discount) that impacts the actual price?
  • is it really the same price for the product across all channels?
  • is the price value supposed to make sense for products that are not yet sold?
  • are there edge-case situations like zeros to reflect missing values?

Dates are also excellent candidates for semantic ambiguities when an orders table contains a date column, the date-time can refer to the time of:

  • the basket validation
  • the payment entry
  • the payment clearance
  • the creation of the order in the accounting package
  • the dispatch
  • the delivery
  • the cloture of the order

However, such a shortlist hardly covers actual oddities encountered in real-life situations. Recently, for example, while working for one of the largest European online businesses, we realized that the dates associated with purchase orders did not have the same meaning dependong on the originating country of the supplier factories. European suppliers were shipping using trucks and the date reflected the arrival in the warehouse; while Asian suppliers were shipping using, well, ships, and the date reflected the arrival to the port. This little twist typically accounted for more than 10 days of difference in the lead time calculation.

For business-related datasets, the semantic of the data is nearly always dependent on the underlying company processes and practices. Documentation relating to such processes, when it exists at all, typically focuses on what is of interest to the management or the auditors, but very rarely on the myriad of tiny elements that exist within the company IT landscape. Yet, the devil is in the details.

Data qualification is not data cleaning

Data cleaning (or cleansing) makes most sense in experimental sciences where certain data points (outliers) need to be removed because they would incorrectly “bend” the experiments. For example, chart measurements in an optics experiment might simply reflect a defect in the optical sensor rather than something actually relevant to the study.

However, this process does not reflect what is typically needed while analyzing business data. Outliers might be encountered when dealing with the leftovers of a botched database recovery, but mostly, outliers are marginal. The (business-wise) integrity of the vast majority of databases currently in production is excellent. Erroneous entries exist, but most modern systems are doing a good job at preventing the most frequent ones, and are quite supportive when it comes to fixing them afterwards as well. However, data qualification is very different in the sense that the goal is neither to remove or correct data points, but rather to shed light on the data as a whole, so that subsequent analysis truly makes sense. The only thing that gets “altered” by the data qualification process is the original data documentation.

Data qualification is the bulk of the effort

While working with dozens of data-driven projects related to commerce, aerospace, hostelry, bioinformatics, energy, we have observed that data qualification has always been the most demanding step of the project. Machine learning algorithms might appear sophisticated, but as long as the initiative remains within the well-known boundaries of regression or classification problems, success in machine learning is mostly a matter of prior domain knowledge. The same goes for Big Data processing.

Data qualification problems are insidious because you don’t know what you’re missing: this is the semantic gap between the “true” semantic as it should be understood in terms of the data produced by the systems in place, and the “actual” semantic, as perceived by the people carrying out data analysis. What you don’t know can hurt you. Sometimes, the semantic gap completely invalidates the entire analysis.

We observe that most IT practitioners vastly under-estimate the depth of peculiarities that comes with most real-life business datasets. Most business don’t even have a full line of documentation per table field. Yet, we typically find that even with half a page of documentation per field, the documentation is still far from being thorough.

One of the (many) challenges faced by Lokad is that it is difficult to charge for something that is not even perceived as a need in the first place. Thus, we frequently shovel data qualification work under the guise of more noble tasks like “statistical algorithm tuning” or similar scientific-sounding tasks.

The reality of the work however is that data qualification is not only intensive from a manpower perspective, it’s also a truly challenging task in itself. It’s a mix between understanding the business, understanding how processes spread over many systems - some of them invariably of the legacy kind, and bridging the gap between the data as it exits and the expectations of the machine learning pipeline.

Most companies vastly underinvest in data qualification. In addition to being an underestimated challenge, investing talent on data qualification does not result in a flashy demo or even actual numbers. As a result, companies rush to the later stages of the data analysis process only to find themselves swimming in molasses because nothing really works as expected. There is no quick-fix for an actual understanding of the data.

Categories: Tags: insights bigdata No Comments

Priceforge released, pricing optimization for commerce

Published on by Joannes Vermorel.

We are proud to announce that Priceforge, our latest app, is immediately available as a public beta. Priceforge supports merchants to gain better insights in their data - sales, inventory, prices, web traffic - and helps them to elaborate better pricing strategies; and more.

Priceforge is first a dashboarding engine to compose power dashboards where numbers that matter - and only numbers that matter - are gathered into a single page. Unlike widespread BI tools, Priceforge is commerce-native, focusing on stuff that trully matters for commerce.

Second, Priceforge is a pricing engine to design of both very simple or very advance pricing strategies. Let's replace dump prices with smart pricing. Pricing is a message sent to the market, and unlike demand forecasting, it's not the sort of problem that can be addressed by pure numerical optimization. Priceforge embraces this vision and instead of forcing prices upon merchants, Priceforge empowers them to improve the prices they have.

Commerce insights

For years, Lokad had not been delivering any data visualization capabilities. The thinking went: yes, data visualization is critical but with hundreds of Business Intelligence (BI) tools out there, surely there must be some great stuff for commerce.

Our customers proved us wrong.

The market is certainly not short of vendors but every time we ventured into our client's IT, we observed that BI was nothing but painful and expensive ventures.

Enumerating pitfalls would be tedious. Let's say that almost all solutions require at least one full time software developer to be of any use; and solutions that were not requiring a developer felt like toys when compared to Microsoft Excel.

Thus, we decided to venture into business intelligence ourselves. However, delivering yet another jack-of-all-trade solution supposedly equaly suitable for FOREX trading and car renting was an obvious pitfall we were committed to avoid.

Priceforge would be tailored for commerce.

Simplicity, power and reliability

Excel is a fantasically powerful tool, and yet, in the same time, because data and logic end up intrically mixed, it leads to unreasonably fragile processes where every refresh put the company at risk of silently breaking the logic buried in the middle of the data.

Merchants needed an approach that would bring both the power of Excel and the reliability of an industrial-grade process where the logic can be audited in-depth and incrementally improved through trials and errors. We decided to go for a tiny scripting language named Envision.

The syntax of Envision is largely inspired from the Excel formula syntax, and it's orders of magnitude simpler than a general purpose programming language.

Want to know more? Check out our tutorial to devise your first pricing strategy with Priceforge.

Categories: Tags: priceforge pricing commerce data bigdata No Comments

Senior developer job at Lokad

Published on by Joannes Vermorel.

We are hiring! Below, a copy of our LinkedIn job post.

Lokad is a team of talented and passionate developers. Business is growing, commerce is more demanding than ever for innovative technologies. We are committed to delivering such technologies.

As a senior developer, you will lead the development of one of our Big Data app (check for more insights in what we do). You will be in charge of bringing our technology to the next level, not to clean up technical debt.

Challenges are numerous:

  • total reliability, because nobody likes crashing a 1000 store network,
  • vast scalability, because 1000 stores is a lot,
  • high accuracy, because we deliver the best numbers.

We expect you to bring a significant expertise to Lokad, but you will benefit from a team capable of coaching you toward your next level of craftsmanship in software design.

We happen to use C#/.NET/MVC on top of Windows Azure, combined with event stores and NoSQL persistence strategies. We expect you to be (or willing to become) extremely proficient in this environment.

We are located 50m from Place d'Italie (Paris 13).

Categories: Tags: hiring job bigdata No Comments

FTP hosting, push your files to Salescast

Published on by Joannes Vermorel.

Since December 2012, Salescast supports importing TSV files. However, until now, Salescast was expecting you to plug your own FTP server to retrieve those files. We felt this was an unnessary complication.

Indeed, while there are a myriad of file hosting services available on the web, we have found that most of them are simply not good at supporting business data transfers: annoying limits are encountered with:

  • the maximal number of concurrent connections, 
  • the maximal file size,  
  • the maximal bandwidth, 
  • ... 

Thus, we decided to roll our own.

We are proud to announce the immediate availaility of our FTP hosting service. Upload and download files from Lokad. The Express Plan comes with 1GB of free storage and 1GB of free bandwidth (per month). This service is compatible with Salescast and the other apps of Lokad.

Technical nugget: In order to deliver maximum scalability and reliability, this service is built on top of Windows Azure - like all the other technologies developed at Lokad. The architecture schema below illustrates how we scale out the workload on multiple virtual machines.

Categories: bigdata, release Tags: bigdata files ftp hosting No Comments

Machine Learning and Big Data talk at TechDays 2013

Published on by Joannes Vermorel.

Last week, we had the chance to speak to an audience of roughly 3000 people attending the Machine Learning and Big Data keynote at the Microsoft TechDays 2013 in Paris. A special thanks to Bernard Ourghanlian for making this possible.

Our client, Pierre-Noel Luiggi (Founder and CEO of Oscaro) was also present - and a formidable support.

For those who could not attend the event, check the video of the session (15min, in French).

Categories: bigdata, video Tags: bigdata events video No Comments

An exciting vision cast into a new product: Introducing BIG DATA PLATFORM [Infographic]

Published on by Joannes Vermorel.

It seems to me that as we grow, our pace of innovation continues to accelerate. We are currently short of somewhat of a frenzy.  More clients means much more exposure to high priority problems in eCommerce and retail, which is our food for innovation. 

The latest addition to our portfolio of Big Data Commerce solutions is a cloud based BIG DATA PLATFORM. It is a truly exciting vision that has been cast into concept and product: Make the capturing, storing and exploiting of all of your company's transactional data in a fast, reliable and agile data platform simple, efficient and low cost. Combine this with smart applications that exploit this data in order to make smarter, faster operative decisions that address specific problems in the company.

Couponing, inventory optimization, pricing, store assortment optimization and personalization of online and offline customer communication are all examples of what can be accomplished with such as system in an efficient and low cost manner. Customer satisfaction, rapid ROI and extreme profitability are the core of what makes us so excited.  Enough said, we chose to use this announcement to try our luck on our very first.... INFOGRAPHIC.

Do you share the excitement of this vision? Like or hate our infographic? Please get in touch or post in the comments.  


Categories: Tags: bigdata cloud computing inventory optimization pricing No Comments

Data Days 2012: Meet us in Berlin!

Published on by Joannes Vermorel.

The 2012 Data Days are taking shape, and we are looking forward to participating in the panel of speakers. With Big Data Intelligence experience in both eCommerce and physical retail, we have been invited to add a unique perspective through the knowledge of two very different worlds in terms of big data availability and exploitation to an otherwise rather eCommerce focused event. 

In four tracks the topics of Data, Relevance, Innovation and Privacy will be discussed. On the second day, the Data Pioneers start-up competition is looking for creative and innovative data business ideas. The program is currently being finalized. 

Are you coming to the event? Please connect with us prior to the event or simply grab us on the day. We are looking forward to seeing you in Berlin!

Categories: business, community Tags: berlin bigdata events No Comments

Spare Parts Inventory Management with Quantile Technology

Published on by Joannes Vermorel.

The management of spare and service parts is as strategically important as it is difficult. In a world where most equipment manufacturers and retailers are operating in fiercely competitive markets, a high service level to the existing customer base is a strategic priority for many players.

Not only does a high spare part availability help build a loyal base of customers, product/equipment companies have also discovered services as an often very profitable and recurring revenue stream that is typically more resilient to economic cycles than equipment sales.

However, managing a spare parts inventory efficiently still poses a huge challenge. Despite a forecasting and inventory planning technology industry that is several decades old, spare parts management has remained a difficult for a number of reasons:

  • Large number of parts: Even smaller equipment manufacturers can easily be confronted with managing more than a hundred thousand spare parts.
  • High service level requirement: Stock outs are often very costly, high to very high service levels are therefore paramount in many industries.
  • Infrequent demand: The demand for spare parts is typically sparse and intermittent, meaning that only very low volumes are required occasionally.

Why standard forecasting technology performs poorly

Unfortunately, the combination of these factors makes standard inventory and forecasting technology ill-suited for spare parts planning. In classic forecasting and inventory planning theory, a forecast is produced by applying models such as moving average, linear regression and Holt Winters and a great deal of attention is given to the forecasting error, which is optimized by measuring MAPE or similar indicators. The transformation into a suggested stock level is done in a second step via classic safety stock analysis.

In the case of sparse time series (also called slow movers: low unit and infrequent sales), this methodology fails. The main issue with forecasting slow movers is that what we are essentially forecasting are zeros. This is intuitively obvious when looking at the demand history of a typical spare parts portfolio on a daily, weekly, or even monthly basis: By far the most frequent data point is zero, which can in some cases make up more than 50% of all recorded data points.

The challenge of forecasting slow movers: Good statistical performance and good inventory practice are not the same.

When applying classic forecasting theory to this type of data set, the best forecast for a slow moving product is by definition a zero. A 'good' forecast from a statistical point of view will return mostly zeros, which is optimal in terms of math, but not useful in terms of inventory optimization.

The classic method completely separates the forecast from replenishment. The problem is, the situation can hardly be improved with a “better” forecast. What actually matters in practice is the accuracy of the resulting inventory level (reorder point ), which is not measured nor optimized.

Changing the vision from Forecast Accuracy to Risk Management

When dealing with slow movers, we believe the right approach is not to approach the problem as a forecasting issue and to try to forecast demand (which is mostly zero). Much rather, the analysis should provide an answer to the question how much inventory is needed in order to insure the desired service level.The whole point of the analysis is not a more accurate demand forecasts, but a better risk analysis. We fundamentally change the vision here.

Determining and optimizing directly the Reorder Point

Quantile forecasts allow the forecasting of the optimal inventory that provides the desired inventory level directly: A bias is introduced on purpose from the start in order to alter the odds of over and under forecasting.

Benchmarks against classic forecasting technology in food, non-food, hardware, luxury and spare parts consistently show that quantile forecasts bring a performance improvement of over 25%, that is either more than 25% less inventory or 25% less stock outs.

In our opinion, by solving the problem of forecasting intermittent and sparse demand in spare parts management, quantile technology not only provides a strong performance increase, but also makes classic forecasts plain obsolete.

Whitepaper spare parts management available for download

Download the whitepaper Spare Parts Inventory Management with Quantile Technology for an in-depth discussion of the topic. Further whitepapers and resources on quantile forecasting and inventory management are available on our resources page

Do you have comments, questions or experiences regarding spare parts management to share? Please participate in the comments below, your contribution is highly valuable to our team.


Categories: Tags: bigdata forecasting quantiles slow movers spare parts whitepaper 1 Comment

Whitepaper: How Big Data will transform retail marketing

Published on by Joannes Vermorel.

Download the whitepaper

Retailers and eCommerce today record large amounts of sales and client data, which provides a rich source of information for marketers. However, the exploitation of this data to date has remained a rather costly, semi-manual and high level exercise that is only scraping the surface of what promises to be a gold mine for marketing. 

We believe Big Data technology will play an important role in the future of retail marketing by helping to address much more effectively the single most important goal of marketing: providing the most relevant communication to the individual client at the right moment in time.

This whitepaper examines how Big Data technology in the coming years will give marketers the tools to transform the effectiveness and ROI of their activities.

  • Targeting: How individual client analysis will replace customer segmentation with true 1-to-1 marketing
  • Measurement: Reliable measurement of promotion conversion, cannibalization, and long term effects through client history and basket analysis
  • Performance: How a closed feedback loop creates a learning system
  • Cost: Full automation enables a massive scalability at low cost
  • Innovation: Intelligent client applications powered by Big Data technology

The stage is set for what we believe will be fundamental improvements in the way marketing can target, assess, optimize and ultimately convert their campaigns into profits for the company. Exciting times are ahead!

Categories: bigdata Tags: bigdata whitepaper No Comments

How Big Data will transform retail marketing

Published on by Joannes Vermorel.

Retail marketers have long tried to approximate the idea of one-to-one marketing. In an ideal world, marketers would deliver to the right customer, at the right time, the most relevant communication.

Digital technologies have greatly increased the number of 1-to-1 communication channels. Mailings are individualized at print, check-out coupons are issued in real time, and websites, online shopping portals and smartphone apps create new touch points with the customers. Yet, the challenge of determining the ‘right’ communication for the individual client remains huge.

Retail marketing today is constrained by customer segmentation

Consumer goods companies and retailers alike use market and customer segmentations to determine consumer needs, product preferences and usage occasions in order to design and target their marketing campaigns. Unfortunately for retailers the challenge is huge since they are serving a very wide range of customer groups accross a huge product portfolio, and a rather dynamic market with often pronounced local differences adds to the complexity.

Strategy consultants McKinsey suggests that segmentation efforts will only be practicable and sustainable if the number of segments is below 10. No wonder marketers struggle to bridge the gap between the needs of millions of customers and a single digit number of customer segments.

Targeting: Big Data technologies replace customer segmentation with individual client analysis

Big Data technology will bring marketers a big step closer to the ideal of true 1-to-1 communication, where instead of fitting customers into a campaign, the marketing effort starts with the individual client and her needs.

The core idea is rather intuitive. By looking at a customer’s basket (at check out) or better purchase history (through loyalty card data) in relation to the baskets and purchase histories of millions of other customers, ‘similar’ customer profiles can be identified and used to ‘learn’ which other products and services will appeal to the customer at hand.

The basis for this advanced behavioral analysis is point-of-sale (receipt) data, plus loyalty card data if available.

Near real time capability of such systems today can be met without high cost through smart architecture that allows an efficient storage and rapid retrieval of huge amounts of data. In a recent whitepaper, we have shown that all receipt data of even the largest retail networks can be processed on a smartphone.

Measurement: Client history and basket analysis allow conversion, uplift and cannibalization measurement

Quantifying the ROI of a marketing initiative is key, yet very complex in retail, given that the impact of a promotion or voucher often not only impacts the promoted product, but the wider product portfolio. Substitution, cannibalization and long term impact all play an important role for the ROI.

Marketers today are deprived of a direct feedback loop on their initiatives, and need to operate largely ‘one eyed’. Big data technologies allow to solve this problem by analyzing a client’s purchase history and basket in relation to past and ongoing promotions and vouchers.

Performance: A closed feedback loop creates a learning system

Just as marketers crave transparency on conversion and marketing ROI, recommendation engines will vastly improve if they directly can measure the quality of applied behavioral analytics. Through iterations the algorithm will improve, or ‘learn’. The statistical learning theory is an invaluable ingredient here.

The prerequisite are clearly defined success/failure characteristics (markers) that can be tracked automatically in order to provide the system with the direct feedback it requires. Luckily, vouchers that are used, promotions that are taken up, events that are attended, or other forms of client action that can be unmistakably linked to the marketing communication and automatically measured, and therefore serve well for such a purpose.

Big Data is setting the stage

Big Data has set the stage for what we believe will be fundamental improvements in the way marketing can target, assess, optimize and ultimately convert their campaigns into profits for the company.

Furthermore, the technology barriers for innovation in retail have never been lower, and we believe first movers will create competitive pressures that will transform the industry. Exciting times ahead!

Categories: bigdata, insights, market Tags: bigdata insights No Comments

Big data in retail, a reality check

Published on by Joannes Vermorel.

Cloud computing being so 2011, big data is going to be a key IT buzzword for 2012. Yet, as far we understand our retail clients, there is one data source that holds above 90% of total information value in their possession: market basket data (tagged with fidelity card information when available).

For any mid-large retail network, the informational value of market basket data simply dwarfs about all other alternative data sources, may it be:

  • In-store video data, which remain difficult to process, and primarily focused on security.
  • Social media data, which are very noisy and reflect as much bot implementations than human behaviors.
  • Market analyst’s reports, which require the scarcest resource of all: management attention.

Yet, beside basic sales projections (aka sales per product, per store, per region, per week …), we observe that, as of January 2012, most retailers are doing very little out of their market basket data. Even forecasting for inventory optimization is typically nothing more than a moving average variant at the store level. More elaborate methods are used fore warehouses, but then, retailers are not leveraging basket data anymore, but past warehouse shipments.

Big Data vendors promise to bring an unprecedented level of data processing power to their clients to let them harness all the potential of their big data. Yet, is this going to bring profitable changes to retailers? Not necessarily so.

The storage capacity sitting on display on the shelves of an average hypermarket with +20 external drives in display (assuming 500GB per drive) typically exceeds the raw storage needed to persist a whole 3 years of history of a 1000 stores network (i.e. 10TB of market basket data). Hence, raw data storage is not a problem, or, at least, not an expensive problem. Then, data I/O (input/output) is a more challenging matter, but again, by choosing an adequate data representation (the details would go beyond the scope of this post), it’s hardly a challenge as of 2012.

We observe that the biggest challenge posed by Big Data is simply manpower requirements to do anything operational with it. Indeed, the data is primarily big in the sense that the company resources, to run the Big Data software and to implement whatever suggestions come out of it, are thin.

Producing a wall of metrics out of market basket data is easy; but it’s is much harder to build a set of metrics worth the time being read considering the hourly costs of employees.

As far we understand our retail clients, the manpower constraint alone explains why so little is being been done with market basket data on an ongoing basis: while CPU has never been to cheap, staffing has never been so expensive.

Thus, we believe that Big Data successes in retail will be encountered by lean solutions that treat, not processing power, but people, as the scarcest resource of all.

Categories: business, insights, market Tags: bigdata insights retail 1 Comment