Developers and administrators of Elasticsearch find it scary when they either see the index is “red” or some of the shards in the “unassigned” state. What’s much scarier is that when they try to identify the reason for the unassigned shards using API’s like “_cat/shards,” or try relocating shards using “_cluster/reroute” API, they fail to identify the real reason and factors that contributed to making some of the shards unassigned.

Wouldn’t it also be nice to find out why a particular shard is assigned to a current node and is not rebalanced to the other node? To help us in getting answers to these questions, Elasticsearch 5.0 released the cluster allocation API,  _cluster/allocation/explain, which is helpful when diagnosing why a shard is unassigned, or why a shard continues to remain on its current node when you might expect otherwise.

Shard allocation is performed by Elasticsearch and it happens seamlessly behind the scenes. Elasticsearch has two main components, allocators and deciders, which take care of shard allocation. Allocators try to find the best nodes to hold the shard, and deciders make the decision if allocating to a node is allowed. However, due to cluster/node configurations problems, node getting disconnected from cluster, disks getting corrupted or disks usage reaching the threshold limits, shards cannot be allocated.

Example

Let’s see the allocation API in action and see how it would assist us in finding out the reason for some of the problems defined above. To start, create an index with 2 primary shards and 1 replica on an ES cluster with a single node:

curl -XPUT "http://localhost:9200/qboxindex" -d ' {
 "settings": {
 "index.number_of_shards": 2,
 "index.number_of_replicas": 1
 }
}'

One of the key concepts of shard allocation is that the primary and replica shards cannot be on the same node. Hence, the index just created would be in the yellow state, because we have a single node ES cluster. 

Allocation API

How about finding the same thing using the Allocation API? The format is below:

curl -XGET "http://localhost:9200/_cluster/allocation/explain" -d ' {
 "settings": {
    "index ": “qboxindex”,
    "shard": 0,
    “primary”: false
 }
}'

You need to specify the index name and the shard number for which the explanation is needed. The API also accepts an optional parameter “primary” set to either true or false if we want to find details about either the primary or replica shard. Below is the response from the query:

{
 "index": "qboxindex",
 "shard": 1,
 "primary": false,
 "current_state": "unassigned",
 "unassigned_info": {
"reason": "INDEX_CREATED",
"at": "2017-04-26T11:43:54.162Z",
"last_allocation_status": "no_attempt"
 },
 "can_allocate": "no",
 "allocate_explanation": "cannot allocate because allocation is not permitted to any of the nodes",
 "node_allocation_decisions": [
{
 "node_id": "u3Lzfl0QRyW_eqD7J9rMLQ",
 "node_name": "u3Lzfl0",
 "transport_address": "127.0.0.1:9300",
 "node_decision": "no",
 "weight_ranking": 1,
 "deciders": [
   {
     "decider": "same_shard",
     "decision": "NO",
     "explanation": "the shard cannot be allocated to the same node on which a copy of the shard already exists [[qboxindex][1], node[u3Lzfl0QRyW_eqD7J9rMLQ], [P], s[STARTED], a[id=2PTYG1d8T5KeEO-XWAWkzg]]"
   }
 ]
}
 ]
}

As seen from the response of the command/API, the reason for the allocation failure is cannot allocate because allocation is not permitted to any of the nodes”. We can find more detailed explanation in the deciders explanation section: “the shard cannot be allocated to the same node on which a copy of the shard already exists [[qboxindex][1], node[u3Lzfl0QRyW_eqD7J9rMLQ], [P], s[STARTED], a[id=2PTYG1d8T5KeEO-XWAWkzg]]”, which validates the fact that primary and replica cannot be on the same node.

A couple of other key parameters in the response are “current_state”, “primary”, “unassigned_info” (the original reason why the shard became unassigned), “node_allocation_decisions” (why the shard was either allocated or not allocated to a particular node).

The Allocation API can also be invoked without specifying any parameters as shown below. In this case, it would provide the information about first unassigned shard it encounters in the cluster:

curl -XGET "http://localhost:9200/_cluster/allocation/explain"

 Now, add one more node to the cluster and invoke the Allocation API again. As shown below, we now have two nodes in the cluster:

curl -XGET http://localhost:9200/_cat/nodes?v

Response:

ip    heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
127.0.0.1       11      78  11                          mdi   -  jFRcqGT
127.0.0.1       13      78  11                          mdi   *  u3Lzfl0

As seen below, there are no unassigned shards and this would be an awesome news for the administrator.

curl -XGET http://localhost:9200/_cluster/allocation/explain?pretty=true

Response:

{
 "error" : {
  "root_cause" : [
   {
   "type" : "illegal_state_exception",
   "reason" : "unable to find any unassigned shards to explain [ClusterAllocationExplainRequest[useAnyUnassignedShard=true,includeYesDecisions?=false]"
   }
 ],
   "type" : "illegal_state_exception",
   "reason" : "unable to find any unassigned shards to explain [ClusterAllocationExplainRequest[useAnyUnassignedShard=true,includeYesDecisions?=false]"
 },
 "status" : 500
}

Now, let's change the replica count to zero as shown below, bring down a node and invoke the allocation API again:

curl -XPUT "http://localhost:9200/qboxindex/_settings" -d ' {
 "index": {
   "number_of_replicas": 0
  }
}’

As you can see below, the reason for the unassigned shard now reads as cannot allocate because a previous copy of the primary shard existed but can no longer be found on the nodes in the cluster” and this happened due to the node holding the valid copy of the shard left the cluster.

curl -XGET http://localhost:9200/_cluster/allocation/explain?pretty=true

Response:

{
 "index" : "qboxindex",
 "shard" : 0,
 "primary" : true,
 "current_state" : "unassigned",
 "unassigned_info" : {
 "reason" : "NODE_LEFT",
 "at" : "2017-04-26T13:55:06.207Z",
 "details" : "node_left[jFRcqGTsQgClIXI9TR9uGQ]",
 "last_allocation_status" : "no_valid_shard_copy"
 },
 "can_allocate" : "no_valid_shard_copy",
 "allocate_explanation" : "cannot allocate because a previous copy of the primary shard existed but can no longer be found on the nodes in the cluster",
 "node_allocation_decisions" : [
{
 "node_id" : "u3Lzfl0QRyW_eqD7J9rMLQ",
 "node_name" : "u3Lzfl0",
 "transport_address" : "127.0.0.1:9300",
 "node_decision" : "no",
 "store" : {
   "found" : false
  }
 }]
}

Using Allocation API we can also find out why a particular shard is not-allocated/not-moved to the desired node, as per filtering rules. You can find the reason in the “can_remain_decisions”, ”can_remain_on_current_node”, ”can_move_to_other_node” parameters, which are shown in the response only if there are some values.

One can also pass additional parameters to the Allocation API like "include_disk_info” with the value set to true if we want to find out information about the disk usage and shards size collected  by the cluster info service:

curl -XGET http://localhost:9200/_cluster/allocation/explain?include_disk_info=true&pretty=true

Response:

{
 "index": "qboxindex",
 "shard": 0,
 "primary": true,
 "current_state": "unassigned",
 "unassigned_info": {
   "reason": "NODE_LEFT",
   "at": "2017-04-26T13:55:06.207Z",
   "details": "node_left[jFRcqGTsQgClIXI9TR9uGQ]",
   "last_allocation_status": "no_valid_shard_copy"
 },
 "cluster_info": {
   "nodes": {
     "u3Lzfl0QRyW_eqD7J9rMLQ": {
     "node_name": "u3Lzfl0",
     "least_available": {
        "path": "D:\\es\\elasticsearch-5.3.1\\data\\nodes\\0",
        "total_bytes": 280976420864,
        "used_bytes": 120315559936,
        "free_bytes": 160660860928,
        "free_disk_percent": 57.2,
        "used_disk_percent": 42.8
   },
   "most_available": {
     "path": "D:\\es\\elasticsearch-5.3.1\\data\\nodes\\0",
     "total_bytes": 280976420864,
     "used_bytes": 120315559936,
     "free_bytes": 160660860928,
     "free_disk_percent": 57.2,
     "used_disk_percent": 42.8
   }
 }
},
"shard_sizes": {
 "[qboxindex][1][p]_bytes": 130
},
"shard_paths": {
 "[qboxindex][1], node[u3Lzfl0QRyW_eqD7J9rMLQ], [P], s[STARTED], a[id=2PTYG1d8T5KeEO-XWAWkzg]": "D:\\es\\elasticsearch-5.3.1\\data\\nodes\\0"
}
 },
 "can_allocate": "no_valid_shard_copy",
 "allocate_explanation": "cannot allocate because a previous copy of the primary shard existed but can no longer be found on the nodes in the cluster",
 "node_allocation_decisions": [
{
 "node_id": "u3Lzfl0QRyW_eqD7J9rMLQ",
 "node_name": "u3Lzfl0",
 "transport_address": "127.0.0.1:9300",
 "node_decision": "no",
 "store": {
   "found": false
 }
}
 ]
}

Conclusion

Cluster Allocation API is really a handy tool for getting insights about shard allocation in your cluster. Using the API, you can identify non-allocation reasons without too much search and trial-and-error tweaking.  Information provided by the API can be then used for troubleshooting issues in your cluster and eventually returning it to the normal (green) state.

Give it a Whirl!

It's easy to spin up a standard hosted Elasticsearch cluster on any of our 47 Rackspace, Softlayer, or Amazon data centers. And you can now provision your own AWS Credits on Qbox Private Hosted Elasticsearch

Questions? Drop us a note, and we'll get you a prompt response.

Not yet enjoying the benefits of a hosted ELK-stack enterprise search on Qbox? We invite you to create an account today and discover how easy it is to manage and scale your Elasticsearch environment in our cloud hosting service.