You asked for it, we’ve got it. So we’re giving it to you, fast.
SSD hard drive storage is now 20x RAM on all new Qbox Supergiant Elasticsearch clusters. Provision, migrate, or resize your Elasticsearch cluster on Supergiant, and get enough disk storage to insure the thought of storage space never crosses your mind again.
Why the ratio, and why is RAM the factor? Elasticsearch, like most datastores, persists data on storage devices (disks/drives of various type). RAM is used as a working memory for the data — an ephemeral place to do things like sort, score, aggregate, cache, and buffer incoming writes.
After some time running clusters in production, usage patterns emerge, and it becomes natural to think in terms of a disk:memory ratio for physical nodes — more specifically, a dataset:memory ratio. Since we have over 3 years experience running thousands of nodes, we can confidently say:
In terms of data:ram, optimal performance is at <= 1:1 and the upper bound is generally around <= 4:1.
However, this is a general rule of thumb for what we call a “typical search application” — these serve a few main (larger) indices, service moderate search volume, and make moderate use of highly RAM-intensive aggregations. These typical search applications represent only a portion of possible Elasticsearch applications. Another significant portion involves Logstash or time-based indices that divide the total dataset into multiple indices for searching individually. These applications tend to have lower search volumes than write volumes and less intensive text analysis, utilizing less RAM compared to a “typical search application” of the same dataset size.
With all that said, not all Elasticsearch applications look the same. Yes, for ~80% of deployments, the (previously imposed) disk size constraint of 10x RAM was a reliable way to protect users from catastrophic instability with a growing dataset. But, for the other ~20%, this constraint was an arbitrary burden.
It would be disingenuous of us to pretend this change in our product is the holy grail of Elasticsearch performance. We know the real challenge is matching RAM and replicas to your particular dataset. However, disk space has enjoyed industry-wide drops in storage prices, so as our engineering team makes infrastructure strides, we happily pass on our cost savings onto you. And at large disk volumes, Supergiant still enables the best Elasticsearch storage and retrieval performance available without manual intervention. It’s the difference Qbox’s focused cluster management and engineering makes.