Beyond in-memory databases

February 15, 2018

technology

Joannes Vermorel

Most IT buzzwords age poorly, and for a good reason: most tech that used to have a competitive advantage gets superseded by superior alternatives within a decade or less. Thus, if a software vendor keeps pressing a buzzword past its expiration date (1) then the simplest explanation is that its R&D team has not even realized that the world has moved on.

Anecdotally, multiple venture capitalists have also told me that there were weary of investing in any software company that was more than a few years old, because most companies never manage to decouple their own tech from the tech landscape that defined them when they started.

(1) The tech market itself defines when technology “expires”. When looking at a given piece of technology, at best you can only guesstimate how long it will remain reasonably close to the state of the art.

In-memory databases used to be an IT buzzword, and this did not age well: any software company that markets itself nowadays as delivering in-memory computing or in-memory is pushing outdated pieces of technology to the market (2). Don’t get me wrong though: making the most of the “memory” - more on this later - has never been more important; however, the computing landscape is now more complex than it used to be.

(2) In theory, it could just be the marketing team that happens to be lagging behind, while the tech team has leaped forward already. However, I have never met any software company that was suffering from this problem. You can safely assume that marketing is always ahead of tech.

In the late 90s and early 00s, a specific type of volatile memory, colloquially referred to as the RAM, had become afordable enough so that increasingly interesting and valuable datasets could fit “in-memory. At the time, most software was engineered around the idea that RAM was so expensive and limited that going to great lengths of complications, just for the sake of restricting as much as possible the RAM pressure, was a worthy approach. By simply revisiting most problems from a fresh, unconstrained approach, i.e. “in-memory” computing, many software vendors achieved tremendous speed-ups against older products, which were exclusively relying on spinning disks.

Fast forward to 2018, in-memory databases are an outdated perspective, and it has been that way for years already. There are many types of data storage:

L1 CPU cache
L2/L3 CPU cache
Local RAM
Local GPU RAM
Local SSD
Local HDD
Remote RAM
Remote SSD
Remote HDD
Tape or Optical storage

I am not even listing newer storage technologies like the Intel Optane which almost represents a class of device of their own.

Vendors promoting “in-memory” computing are hinting that their software technology is dominantly geared toward the exploitation of two types of memory: the local RAM and the remote RAM. While making the most of the RAM, both local and remote, is certainly a good thing, it also outlines engineering approaches that are underusing the alternatives.

For example, over the last two decades, the CPU cache has gone from 512KB to over 60MB for high-end CPUs. With that much CPU cache, it’s now possible to do “in-cache computing”, bringing massive speed-ups over plain “in-memory computing”. However, leveraging the cache does require the minification of many data structures or even smarter strategies, well beyond what is considered as necessary or even desirable from the RAM perspective.

However, only pointing out that CPU cache is faster than local RAM would be missing the point. Nowadays, good software engineering involves maxing out the respective capabilities of all those classes of data storage. Thanks to cloud computing, assembling an ad-hoc mix of computing resources has never been easier.

Thus, Lokad is not delivering “in-memory” software technology, because it would prevent us from taking advantage of the other options that are presently available to us. For example, while we could rent machines with up to 2TB of RAM, it would be needlessly expensive for our clients. There are many algorithms that can be entirely streamed; thus processing TBs of data does not require TBs of RAM, and considering that 1GB of RAM is about 1000x more expensive than 1GB of HDD, it’s not an implementation detail.

Our goal is not to adhere to some rigid perspective on software engineering, but to stick to the broader engineering perspective, which consists of doing the most you can with the budget you have. In other words, we are inclined to use in-memory processing whenever it outcompetes the alternatives, but no more.

Then, as computing resources are not completely adjustable on-demand - e.g. you cannot realistically rent a machine without a CPU cache - at Lokad, we strive to make the most of all the resources that are being paid for, even if those resources were not strictly requested in the first place.

Back to blog ›

Beyond in-memory databases

More Posts