Developers and administrators of Elasticsearch find it scary when they either see the index is “red” or they see some of the shards in “unassigned” state. What’s much scarier is that when they try to identify the reason for the unassigned shards using API’s like “_cat/shards,” or try relocating shards using “_cluster/reroute” API, they fail to identify the real reason and factors that contributed to making some of the shards unassigned.

Wouldn’t it also be nice to find out why a particular shard is assigned to a current node and is not rebalanced to the other node? To help us in getting answers for this, Elasticsearch 5.0 released the cluster allocation API,  _cluster/allocation/explain, which is helpful when diagnosing why a shard is unassigned, or why a shard continues to remain on its current node when you might expect otherwise.

Shard allocation is performed by Elasticsearch and it happens seamlessly behind the scenes. Elasticsearch has two main components, allocators and deciders, which take care of shard allocation. Allocators try to find the best nodes to hold the shard, and deciders make the decision if allocating to a node is allowed. However, due to cluster/node configurations problems, or the node getting disconnected from cluster, or disks getting corrupted or disks usage reaching the threshold limits, shards cannot be allocated.

Example

Let’s see the allocation API in action and see how it would assist us in finding out the reason for some of the problems defined above. To start, create an index with 2 primary shards and 1 replica on an ES cluster with a single node:

curl -XPUT "http://localhost:9200/qboxindex" -d ' {
 "settings": {
"index.number_of_shards": 2,
"index.number_of_replicas": 1
 }
}'

One the key concepts of shard allocation is that the primary and replica cannot be on the same node. Hence, the index just created would be in yellow state, because it has a single node ES cluster. 

Allocation API

How about finding the same thing using the Allocation API? The format is below:

curl -XGET "http://localhost:9200/_cluster/allocation/explain" -d ' {
 "settings": {
"index ": “qboxindex”,
"shard": 0,
  “primary”: false
 }
}'

You need to specify the index name and the shard number for which they need explanation for. The API also accepts the optional parameter “primary” set to either true or false if we want to find details about either the primary or replica shard. Below is the response of the above command:

Response

{
 "index": "qboxindex",
 "shard": 1,
 "primary": false,
 "current_state": "unassigned",
 "unassigned_info": {
"reason": "INDEX_CREATED",
"at": "2017-04-26T11:43:54.162Z",
"last_allocation_status": "no_attempt"
 },
 "can_allocate": "no",
 "allocate_explanation": "cannot allocate because allocation is not permitted to any of the nodes",
 "node_allocation_decisions": [
{
 "node_id": "u3Lzfl0QRyW_eqD7J9rMLQ",
 "node_name": "u3Lzfl0",
 "transport_address": "127.0.0.1:9300",
 "node_decision": "no",
 "weight_ranking": 1,
 "deciders": [
   {
     "decider": "same_shard",
     "decision": "NO",
     "explanation": "the shard cannot be allocated to the same node on which a copy of the shard already exists [[qboxindex][1], node[u3Lzfl0QRyW_eqD7J9rMLQ], [P], s[STARTED], a[id=2PTYG1d8T5KeEO-XWAWkzg]]"
   }
 ]
}
 ]
}

As seen from the response of the command/API, we can find the reason as cannot allocate because allocation is not permitted to any of the nodes” and the reason for this explanation can be found in the deciders explanation section i.e “the shard cannot be allocated to the same node on which a copy of the shard already exists [[qboxindex][1], node[u3Lzfl0QRyW_eqD7J9rMLQ], [P], s[STARTED], a[id=2PTYG1d8T5KeEO-XWAWkzg]] which validates the fact that primary and replica cannot be on the same node.

Interested in Kubernetes? Check out our Kubernetes as a Service Platform, Supergiant

A couple of other key parameters in the response are “current_state”, “primary”, “unassigned_info” (the original reason why the shard became unassigned), “node_allocation_decisions” (why the shard was either allocated or not allocated to a particular node).

The Allocation API can also be invoked without specifying any parameters as shown below. In this case, it would provide the information about first unassigned shard it encounters in the cluster:

curl -XGET "http://localhost:9200/_cluster/allocation/explain"

 Add one more node to the cluster and invoke the allocation api again. As shown below, we have two nodes in the cluster:

curl -XGET http://localhost:9200/_cat/nodes?v

Response

ip    heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
127.0.0.1       11      78  11                          mdi   -  jFRcqGT
127.0.0.1       13      78  11                          mdi   *  u3Lzfl0

As seen below, there are no unassigned shards and this would be an awesome news for administrator.

curl -XGET http://localhost:9200/_cluster/allocation/explain?pretty=true

Response

{
 "error" : {
"root_cause" : [
 {
   "type" : "illegal_state_exception",
   "reason" : "unable to find any unassigned shards to explain [ClusterAllocationExplainRequest[useAnyUnassignedShard=true,includeYesDecisions?=false]"
 }
],
"type" : "illegal_state_exception",
"reason" : "unable to find any unassigned shards to explain [ClusterAllocationExplainRequest[useAnyUnassignedShard=true,includeYesDecisions?=false]"
 },
 "status" : 500
}

Now, change the replicas to zero as shown below, and bring down a node and invoke the allocation API again:

curl -XPUT "http://localhost:9200/qboxindex/_settings" -d ' {
 "index": {
"number_of_replicas": 0
 }
}’

As seen below, we can find out the reason for the unassigned shard is cannot allocate because a previous copy of the primary shard existed but can no longer be found on the nodes in the cluster” and this happened due to the node holding the valid copy of the shard left the cluster.

curl -XGET http://localhost:9200/_cluster/allocation/explain?pretty=true

Response

{
 "index" : "qboxindex",
 "shard" : 0,
 "primary" : true,
 "current_state" : "unassigned",
 "unassigned_info" : {
"reason" : "NODE_LEFT",
"at" : "2017-04-26T13:55:06.207Z",
"details" : "node_left[jFRcqGTsQgClIXI9TR9uGQ]",
"last_allocation_status" : "no_valid_shard_copy"
 },
 "can_allocate" : "no_valid_shard_copy",
 "allocate_explanation" : "cannot allocate because a previous copy of the primary shard existed but can no longer be found on the nodes in the cluster",
 "node_allocation_decisions" : [
{
 "node_id" : "u3Lzfl0QRyW_eqD7J9rMLQ",
 "node_name" : "u3Lzfl0",
 "transport_address" : "127.0.0.1:9300",
 "node_decision" : "no",
 "store" : {
   "found" : false
 }
}
 ]
}

Like above, one can issue the command for finding the reason why a particular shard is assigned to a node as well. Even though filtering rules are configured sometimes, we see some shards that are configured to be on a node are not present. The Allocation API can explain that too. 

For example, why a particular shard is not-allocated/not-moved to the desired node, as per filtering rules. You can find the reason in the “can_remain_decisions”, ”can_remain_on_current_node”, ”can_move_to_other_node” parameters, which get shown in the response only if there are some values.

One can also pass additional parameters to the allocation API like "include_disk_info” with the value set to true if we want to find out the information gather by the cluster info service about the disk usage and shards size:

curl -XGET http://localhost:9200/_cluster/allocation/explain? include_disk_info=true&pretty=true

Response

{
 "index": "qboxindex",
 "shard": 0,
 "primary": true,
 "current_state": "unassigned",
 "unassigned_info": {
"reason": "NODE_LEFT",
"at": "2017-04-26T13:55:06.207Z",
"details": "node_left[jFRcqGTsQgClIXI9TR9uGQ]",
"last_allocation_status": "no_valid_shard_copy"
 },
 "cluster_info": {
"nodes": {
 "u3Lzfl0QRyW_eqD7J9rMLQ": {
   "node_name": "u3Lzfl0",
   "least_available": {
     "path": "D:\\es\\elasticsearch-5.3.1\\data\\nodes\\0",
     "total_bytes": 280976420864,
     "used_bytes": 120315559936,
     "free_bytes": 160660860928,
     "free_disk_percent": 57.2,
     "used_disk_percent": 42.8
   },
   "most_available": {
     "path": "D:\\es\\elasticsearch-5.3.1\\data\\nodes\\0",
     "total_bytes": 280976420864,
     "used_bytes": 120315559936,
     "free_bytes": 160660860928,
     "free_disk_percent": 57.2,
     "used_disk_percent": 42.8
   }
 }
},
"shard_sizes": {
 "[qboxindex][1][p]_bytes": 130
},
"shard_paths": {
 "[qboxindex][1], node[u3Lzfl0QRyW_eqD7J9rMLQ], [P], s[STARTED], a[id=2PTYG1d8T5KeEO-XWAWkzg]": "D:\\es\\elasticsearch-5.3.1\\data\\nodes\\0"
}
 },
 "can_allocate": "no_valid_shard_copy",
 "allocate_explanation": "cannot allocate because a previous copy of the primary shard existed but can no longer be found on the nodes in the cluster",
 "node_allocation_decisions": [
{
 "node_id": "u3Lzfl0QRyW_eqD7J9rMLQ",
 "node_name": "u3Lzfl0",
 "transport_address": "127.0.0.1:9300",
 "node_decision": "no",
 "store": {
   "found": false
 }
}
 ]
}

Conclusion

The Cluster Allocation API is really a handy API which should be used to understand about shards allocation in the cluster and thus gain insights about the cluster which would eventually help in troubleshooting and fixing the cluster back to its normal state.

Give it a Whirl!

It's easy to spin up a standard hosted Elasticsearch cluster on any of our 47 Rackspace, Softlayer, or Amazon data centers. And you can now provision your own AWS Credits on Qbox Private Hosted Elasticsearch

Questions? Drop us a note, and we'll get you a prompt response.

Not yet enjoying the benefits of a hosted ELK-stack enterprise search on Qbox? We invite you to create an account today and discover how easy it is to manage and scale your Elasticsearch environment in our cloud hosting service.

comments powered by Disqus