In this tutorial we cover a few commonly occurring issues for shard management in elasticsearch, their solutions, and also a few best practices. In some use cases, we incorporate special tricks to get things done. 

Moving a Shard from One Node to Another

This is one of the most commonly occurring use cases when dealing with clusters of any size. A typical scenario is that if too many shards are present in a single node, they are all used up for querying or indexing. 

This presents a potential risk for node/cluster health. It is good practice to move shards from one node to another. Elasticsearch might not deal with this situation, which means we need to intervene manually. How do we achieve this?

Elasticsearch provides an API which operates on the cluster, which allows us to shift one shard from one node to another. See below:

curl -XPOST '<a href="http://localhost:9200/_cluster/reroute">http://localhost:9200/_cluster/reroute</a>' -d '{
"commands" : [
{
"move" :
{
"index" : "testindex", "shard" : 0,
"from_node" : "target_node_name", "to_node" : "destination_node_name"
}
}
]
}'

In elasticsearch, when an index is created with default settings, we have 5 primary shards created for that index. These shards are numbered 0 to 4. Numbering in elasticsearch starts from 0. In the above request we have provided 0 as the value to the parameter called "shard". This is the shard number for the index named "testindex" for identification.

Each node under a cluster has a unique name. The target node's name and the destination node's name should also be provided for the above API to work properly.

Decommissioning a Node

Another use case is decommissioning a node from an active cluster. This scenario becomes important when we have to decommission the node without causing an downtime or restarting the cluster. Elasticsearch provides an option to remove/decommission a node gracefully without losing data or causing down times. Look how it can be done:

curl -XPUT localhost:9200/_cluster/settings -d '{
"transient" :{
"cluster.routing.allocation.exclude._ip" : "IP of the node"
}
}'

The above API makes the cluster stop allocating anything to the specified node and excludes it. Meanwhile, the data from this node is ported to a non-excluded node. This data transfer will occur in the background. When complete, it leads to a complete decommissioning of the node.

When decommissioning a node, the disk space availability in the other nodes should be more than the data size to be transported. Otherwise, the cluster state may become red or yellow, which could cause down time.

It is often helpful to have other options to identify the node to be decommissioned. In the above example we have the node identified by the "ip" of the node. We can also do the same using the "node id" and the "node name", which are unique per cluster.

Exclude by Node ID:

curl -XPUT localhost:9200/_cluster/settings -d '{
"transient" :{
"cluster.routing.allocation.exclude._id" : "unique id of the node"
}
}'

Exclude the Node by Name:

curl -XPUT localhost:9200/_cluster/settings -d '{
"transient" :{
"cluster.routing.allocation.exclude._name" : "name of the node"
}
}'

How do we see if the decommissioning of the node is over? We have two provisions for that.

Check the cluster health to see if there is any reallocation happening.

curl -XGET '<a href="http://localhost:9200/_cluster/health?pretty">http://localhost:9200/_cluster/health?pretty</a>'

The response for the above is below:

{
"cluster_name" : "elasticsearch",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 15,
"active_shards" : 15,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 15,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 50.0
}

In the above response, the "relocating_shards" value is 0 which indicates that there are no shards being transferred.

Check the excluded node's status by using the below API.

curl -XGET '<a href="http://localhost:9200/_nodes/">http://localhost:9200/_nodes/</a><NAME_OF_THE_NODE>/stats/indices?pretty‌'

In the response, check the field "indices.docs.count". If it is zero, the data transfer is complete.

Renaming Indices

Another use case is renaming indices. Renaming indices can be done in a couple of ways depending on the use case.

Aliasing

If we want an index to be renamed without losing any data, the most commonly used method is aliasing.

For example, we want to rename the index "testindex" to "testindex-1". We can provide the alias name of "testindex-1" to the index "testindex", so that all the requests referring to "testindex-1" will now be routed to "testindex". This can be done as below:

curl -XPOST 'localhost:9200/_aliases?pretty' -H 'Content-Type: application/json' -d'
{
"actions" : [
{ "add" : { "index" : "testindex", "alias" : "testindex-1" } }
]
}'

This method allows us to rename the indices with zero down time.

Reindex API

Sometimes aliasing is not the best choice for renaming. In such cases, we are left with the option called reindexing. It will reindex all the documents from a target index to a destination index. For this to be done effectively there are two things to be checked:

  1. Whether there is enough space left on the machine.

  2. Whether the destination index exists with the right mapping.

If the above two conditions are matched, we can use the reindex API as below:

curl -XPOST 'localhost:9200/_reindex?pretty' -H 'Content-Type: application/json' -d'
{
"source": {
"index": "testindex"
},
"dest": {
"index": "testindex-1"
}
}'

Conclusion

In this tutorial we discussed common issues like node reallocation, node decommission, and renaming indices via aliases or reindexing. Questions/Comments? Drop us a line below.

Other Tutorials

Give It a Whirl!

It's easy to spin up a standard hosted Elasticsearch cluster on any of our 47 Rackspace, Softlayer, Amazon or Microsoft Azure data centers. And you can now provision a replicated cluster.

Questions? Drop us a note, and we'll get you a prompt response.

Not yet enjoying the benefits of a hosted ELK-stack enterprise search on Qbox? We invite you to create an account today and discover how easy it is to manage and scale your Elasticsearch environment in our cloud hosting service.