Qbox to join forces with Instaclustr. Read about it on our blog post here.

Qbox has become the premier provider of hosted Elasticsearch services. The project began as a more economical solution to Amazon’s Cloudsearch, but we quickly got caught up in the flexibility and power of Elasticsearch.

Now, several years later, we’re enjoying steady growth as we help Elasticsearch developers and ops staff reduce their aches and pains — while lowering their overall operating costs. We invite you to read this article for an overview of the major considerations of launching and scaling Elasticsearch clusters.


Scale can be an intimidating and sometimes frustrating concept for developers, and few developers would say that it’s an enjoyable aspect of application development.

But allow us to share a bit of encouragement: Elasticsearch scales beautifully — and we’ve built an array of functional and architectural upgrades into Qbox over the past three years so our customers are now enjoying more moments of sanity and equilibrium.

The feedback we get from many developers using our systems is that they really appreciate smoother processes for launching, managing, and scaling their ES clusters. Just to give a glimpse, here’s a snapshot of our cluster management dashboard:


The goal of this article is to provide some guidance on launching and scaling an Elasticsearch cluster. This is a deep subject, and we are only providing an overview here. We list more resources near the end of this article.

Knowing your Application Requirements

We recommend that you estimate the load of your application before you launch a cluster and index your data. No one enjoys experimenting in a production environment, so estimation can save you lots of trouble.

A cluster that will insert thousands of logs every minute and support searches many times per hour will surely require different hardware than a cluster hosting a small dataset for a high-traffic ecommerce site. It’s critcally important that you take time to think about volume and frequency of both search traffic and document writes. Having a general idea of the read/write balance of your application is extremely helpful in preparing for cluster deployments.

Deciding on Hardware

“Bigger nodes or more nodes?” is a question that we frequently hear.

An exhaustive answer would require a lengthy whitepaper. Our team at Qbox recommends starting with at least a medium/large box (2 cores and 3-4gb RAM in a 2-node setup. Our experience continues to validate this seemingly arbitrary suggestion.

Why bigger boxes? Well, if you’ve got a small dataset that’s relatively static in size, maybe a m1.small or even a m1.micro on EC2 would work. If the search load changes, you could just add or remove nodes. However, most applications datasets should be expected to grow — and resizing servers means at least some downtime. There’s really no way around it without a tedious custom script to swap out nodes — and probably a massive headache. So choose a box size with room to grow. Then you can add nodes for scale with little or no downtime.

Why more than one node? At first glance, it might seem economical to choose one large node instead of several smaller ones. It seems logical: big dataset, low search volume. Why complicate the setup with more nodes, you might ask.

While you may thinkk you don’t need multiple nodes to start, we’ve found it to be common for our customers to quickly overload a single-node cluster. Unthrottled bulk inserts come blazing in, GC duration spikes occur, queries hang in the balance waiting for IO access….. Panic! Elasticsearch was built to withstand situations like this — it’s a distributed search engine! Nodes working in parallel make a happy cluster.

Shards and Replicas

With a multi-node setup, we recommend that you give serious consideration to the index-level settings for number_of_shards and number_of_replicas. The optimal value for these settings depends on the needs of the application. A multi-node cluster with a number of indexes that vary in size could, for example, improve performance with custom document routing. Instead of every node storing a part of every index, specific nodes are made responsible for specific indices, thus avoiding a query on the entire cluster.

However, for most use cases (especially those with one main index), the optimal setup is full-replication. For n nodes, we recommend that you configure for n shards and n – 1 replicas. Each node stores an entire copy of the data set, which protects against node failure and evenly distributes any load across the cluster.

Something to keep in mind: while you can adjust replicas on the fly, you cannot change number_of_shards without reindexing. So, you may be asking, how can I scale without downtime?

Answer: By over-allocating shards. Nodes can contain more than one primary shard (although there’s no performance gain in doing so), so we recommend that you start with a higher number_of_shards than the number of nodes. For example, if you start with a 2-node cluster and expect to grow to 6 nodes, then create your index with 6 shards instead of 2. As you add nodes, Elasticsearch will automatically rebalance the load by relocating the extra shards to the new nodes. Boom, no downtime!

Monitoring, Tuning, and Knowing when to Scale

The Elasticsearch community is overflowing with plugin goodness. There are many excellent plugins for monitoring your cluster. Below we suggest additional reading on the favorite tools of our customers.



Our favorites at Qbox are Elastic-HQ, Paramedic, and Bigdesk. We can’t overstate the importance of these tools for diagnosing performance issues and watching for indications that it’s time to scale.


It may not be necessary to add a node immediately when you start to notice signs of cluster overload. There are some settings that you can tune to reconfigure system loading. One of the more important settings is the refresh_interval for an index. Maybe you’re using Elasticsearch to merely store logs and so you don’t need to have each log searchable immediately upon insertion. By default, indices refresh every second. If you can tolerate some lag between inserting and searching for a document, then you can reduce an enormous amount of overhead by raising it to 10 seconds, 60 seconds, or perhaps as much as 10 minutes.


Elasticsearch has really opened our eyes to a whole new range of possibilities in search and analytics for big data. When comparing to the vast majority of other data stores and full-text search engines, using ES at scale is truly a pleasant experience.

See these other topics:


Editor’s note: Our metrics tell us that this article has been especially helpful since it was originally written back in November 2013, so we’re republishing it after adding a few enhancements.“>