A key question in the minds of most Elasticsearch users when they create an index is “How many shards should I use?" In this article, we explain the design tradeoffs and performance consequences of choosing different values for the number of shards. Continue reading if you want to learn how to demystify and optimize your sharding strategy.
This is an important topic, and many users are apprehensive as they approach it -- and for good reason. A major mistake in shard allocation could cause scaling problems in a production environment that maintains an ever-growing dataset.
On the other hand, we know that there is little Elasticsearch documentation on this topic. Most users just want answers -- and they want specific answers, not vague number ranges and warnings for arbitrarily large numbers.
Well, we have some answers. After covering a few definitions and some clarifications, we present several common use cases and provide our recommendations for each.
If you're fairly new to Elasticsearch, it's important that you understand the basic jargon and grasp the elemental concepts.
(If you have some expertise with ES, you might want to skip to the next section.)
Consider this simple diagram of an Elasticsearch cluster:
Remember these definitions while refering to this diagram:
cluster – An Elasticsearch cluster consists of one or more nodes and is identifiable by its cluster name.
node – A single Elasticsearch instance. In most environments, each node runs on a separate box or virtual machine.
index – In Elasticsearch, an index is a collection of documents.
shard – Because Elasticsearch is a distributed search engine, an index is usually split into elements known as shards that are distributed across multiple nodes. Elasticsearch automatically manages the arrangement of these shards. It also rebalances the shards as necessary, so users need not worry about the details.
replica – By default, Elasticsearch creates five primary shards and one replica for each index. This means that each index will consist of five primary shards, and each shard will have one copy.
Allocating multiple shards and replicas is the essence of the design for distributed search capability, providing for high availability and quick access in searches against the documents within an index. The main difference between a primary and a replica shard is that only the primary shard can accept indexing requests. Both replica and primary shards can serve querying requests.
In the diagram above, we have an Elasticsearch cluster consisting of two nodes in a default shard configuration. Elasticsearch automatically arranges the five primary shards split across the two nodes. There is one replica shard that corresponds to each primary shard, but the arrangement of these replica shards is altogether different from that of the primary shards. Again, think distribution.
Allow us to clarify: Remember, the
number_of_shards value pertains to indexes—not to the cluster as whole. This value specifies the number of shards for each index (not the total primary shards in the cluster).
A Word about Replicas
We don't elaborate in this article on Elasticsearch replicas. That is an entirely separate topic that we cover elsewhere. Replicas are primarily for search performance, and a user can add or remove them at any time. As we explain in that article, additional replicas give you additional capacity, higher throughput, and stronger failover.
Allocate Shards Carefully
After you configure an Elasticsearch cluster, it's critically important to realize that you cannot modify the shard allocation later. If you later find it necessary to change the number of shards, then you would need to reindex all the source documents. (Although reindexing is a long process, it can be done without downtime).
The primary shard configuration is quite analogous to a hard disk partition, in which a repartition of raw disk space requires a user to back up, configure a new partition, and rewrite data onto the new partition.
Small Static Dataset, 2-3 GB
The key consideration as you allocate shards is your expectation for the growth of your dataset.
We quite often see the tendency to unnecessarily overallocate on shard count. Since share count such a hot topic within the ES community, users may assume that overallocation is a safe bet. (By overallocation, we simply mean specifying more shards per index than is necessary for the current size (document count) for a particular dataset.)
Elastic was promoting this idea in the early days, but then many users began taking it too far—such as allocating 1,000 shards. Elastic now provides a bit more cautious rationale:
"A little overallocation is good. A kagillion shards is bad. It is difficult to define what constitutes too many shards, as it depends on their size and how they are being used. A hundred shards that are seldom used may be fine, while two shards experiencing very heavy usage could be too many."
Remember that there is an additional cost for each shard that you allocate:
- Since a shard is essentially a Lucene index, it consumes file handles, memory, and CPU resources.
- Each search request will touch a copy of every shard in the index, which isn't a problem when the shards are spread across several nodes. Contention arises and performance decreases when the shards are competing for the same hardware resources.
- Elasticsearch uses term frequency statistics to calculate relevance, but these statistics correspond to individual shards. Maintaining only a small amount of data across a many shards will tend to result in poor document relevance.
Our customers expect their businesses to grow and their datasets to expand accordingly. There is therefore always a need for contingency planning. Many users convince themselves that they'll encounter explosive growth (although most never actually see an unmanageable spike). In addition, we all want to minimize downtime and avoid resharding.
If you worry about rapid data growth, then we suggest a focus on a simple constraint: the maximum JVM heap size recommendation for Elasticsearch is approximately 30-32GB. This is a solid estimate on the limit of your absolute maximum shard size. For example, if you really think it possible that you could reach 200GB (but not much further without other infrastructure changes), then we recommend an allocation of 7 shards, or 8 shards at most.
By all means, don't allocate for an inappropriately high goal of 10 terabytes that you might attain three years from now. It's likely that you'll see some performance strain—sooner than you like.
Although we aren't explaining replicas in detail here, we do recommend that you plan for a modest number of shards and consider increasing the number of replicas. If you're configuring a new environment, then perhaps you want to have a look at our replicated clusters. With a replicated cluster, you get a three-node cluster that includes one replica with an option to easily increase the number of replicas as your requirements change.
Large and Growing Dataset
We strongly encourage you to rely on over-allocation for large datasets—but only modestly. You can still use the 30GB maximum shard size guideline that we give above.
We do, however, suggest that you continue to picture the ideal scenario as being one shard per index, per node. A good launch point for capacity planning is to allocate shards with a factor of 1.5 to 3 times the number of nodes in your initial configuration. If you're starting with 3 nodes, then we recommend that you specify at most 3 x 3 = 9 shards.
Your shard size may be getting too high if you're discovering issues through the cluster stats APIs or encountering minor performance degradations. If this is the case, simply add a node and ES will will rebalance the shards acccordingly.
Once again, please note that we're omitting the specification of replicas from our discussion here. The same ideal shard guideline of one shard per index per node also holds true for replica shards. So if you need only one replica, then you'll need twice as many nodes. Two replicas would require three times the number of nodes. For more details, see our article on Replicated Clusters.
Do you accumulate daily indices and yet incur only small search loads? Perhaps these indices number in the hundreds, but each index is 1GB or smaller. For these and similar problem spaces, our simple recommendation is that you choose one shard.
If you roll with the defaults for Logstash (daily indices) and ES (5 shards), you could generate up to 890 shards in 6 months. Further, your cluster will be hurting—unless you have 15 nodes or more.
Think about it: most Logstash users are infrequent searchers, performing fewer than one query per minute. Accordingly, we recommend a simple economical setup. Since search performance isn't a primary requirement for such cases, we don't need multiple replicas. A single replica is enough for basic redundancy. The data-to-memory ratio can also be quite high.
If you go with a single shard per index, then you could probably run a Logstash configuration for 6 months on a three-node cluster. Ideally, you'd use at least 4GB, but we'd recommend 8GB because 8GB is where network speed starts to get significantly better on most cloud platforms and much less resource-sharing.
We reiterate that shards consume resources and require processing overhead.
To compile results from an index consisting of more than one shard, Elasticsearch must query each shard individually (although in parallel), and then it must perform operations on the aggregated results. Because of this, a machine with more IO headroom (SSDs) and a multi-core processor can definitely benefit from sharding, but you must consider the size, volatility, and future states of your dataset. While there is no one-size-for-all with respect to shard allocation, we hope that you can benefit from this discussion.
Please leave comments or questions for us below.
Other Helpful Resources
Have a look at these other resources that can help you optimize your work with Elasticsearch:
Give It a Whirl!
It's easy to spin up a standard hosted Elasticsearch cluster on any of our 47 Rackspace, Softlayer, Amazon or Microsoft Azure data centers. And you can now provision a replicated cluster.