In this installment of the Elasticsearch 2.3.0 API series, we discuss task management APIs such as the “tasks” and “cancel” APIs in detail with working examples.

Note: This API was released in Elasticsearch 2.3.0. Update_by_query will not work in previous versions.

Setup

Elasticsearch 2.3.0

Because we are dealing with the reindex, update by query, and tasks API in this blog, we need Elasticsearch 2.3.0, which can be downloaded from here.

Enable Inline Scripting

Next we enable dynamic scripting in Elasticsearch by setting "script.inline: true" in the elasticsearch.yml file. This is needed because we are using inline scripts in this tutorial.

Create a Test Index with Much Data

For the demonstration purposes of this tutorial, we have created an index called "test-index" with nearly 15,000 documents. This allows the reindexing and update_by_query operations to last a bit longer to test our "tasks" and "cancel" APIs.

Tasks API

Task management is one of the important tasks when it comes to a database. If we are reindexing, or, say, if we are doing an update by query operation on a huge index, these operations by default have no provision to give us the current status of the reindexing/update_by_query operations.

In order to fetch the current status of the operation, Elasticsearch 2.3.0 has provided us with the Tasks API. There are a number of ways we can make use of tasks API, so let’s go through each of them.

Learn why container architecture is important to the future of your business >

For General Task Status

For understanding the uses of the tasks API, let’s reindex  "test-index",  which contains around 15,000 documents, to another index. This will take some time, so during that time we will use the tasks API in another terminal and look for the status.

Step 1: Reindexing

curl -XPOST 'localhost:9200/_reindex' -d '{
  "source": {
    "index": "test-index"
  },
  "dest": {
    "index": "test-index-1"
  }
}’

Step 2
Now, open another terminal window and run this query:

curl -X GET 'localhost:9200/_tasks?pretty&detailed&actions=*reindex,*byquery'

This command will return all the actions related to the reindexing or update_by_query operations in the response. This results in the following response:

Elasticsearch Task API Response Example

Now, familiarize yourself with these response parameters, corresponding with the picture above:

  1. Is "name" and denotes the name of the node in which the task is happening.
  2. This is the id of the task. We can use it to refer the tasks when using the tasks API.
  3. The "action" field indicates what type of action is happening.
  4. The "status" field indicates the details of the current process.
  5. Indicates the process start time in epoch.

To Get the Status of Individual Nodes:

curl -X GET 'localhost:9200/_tasks?nodes=node-1-es-2.3.0'

Here we pass the respective node name, "node-1-es-2.3.0", along with the parameter "node" in the query. That way we can get the tasks status of the required node.

To Get the Status of Multiple Nodes:

  curl -X GET 'localhost:9200/_tasks?nodes=node-1-es-2.3.0,node-2-es-2.3.0'

Here we have passed multiple node names in the "node" parameters to fetch the data for getting the status of multiple nodes.

To Get all the Cluster-Related Tasks Running on Nodes:

  curl -X GET 'localhost:9200/_tasks?nodes=node-1-es-2.3.0,node-2-es-2.3.0&actions=cluster:?*'

This returns all the cluster-related processes running on the nodes specified in the query.

Information on a Particular Task

We have seen the Task ID parameter in the response of the tasks API. We can retrieve information for that task with the Task ID of a particular task and the tasks API.

For example, the Task ID in the above response was "FycHVqifT32E-eOJDwRXTg:929". To monitor the information regarding that task, apply the tasks API like below:

curl -X GET 'localhost:9200/_tasks/FycHVqifT32E-eOJDwRXTg:929'

Wait for Completion

There is an option to wait for completion of a particular task, like below:

curl -X GET 'localhost:9200/_tasks/FycHVqifT32E-eOJDwRXTg:929/wait_for_completion=true&timeout=100s? '

Cancel API

Another important API that is introduced in Elasticsearch 2.3.0 is "_cancel". Sometimes during large volumes of data reindexing/updating, we realize that there:

  1. Needs to be a change in mapping, or
  2. There was something wrong with the scripts.

It is impossible to stop the process, so we lose a great deal of time. The "_cancel" API was created to solve this problem.

Suppose we are reindexing a huge index, "taskid1", and in between we need to cancel the process. The following is how we send the request to cancel the process:

curl -X POST 'localhost:9200/_tasks/taskid1/_cancel'

Cancel API also supports the selective cancellation of processes, like below:

curl -X POST 'localhost:9200/_tasks/_cancel?Node_id=node1,node2&actions=*reindex'

The above command will cancel all the "reindex" actions coordinated on node1 and node2. If a "reindex" is coordinated on another node but running involved actions (query and bulk) on node1 or node2 , these will not be cancelled.

Conclusion

In this post we have learned the "tasks" API and the "cancel" API with different case studies. Please comment below.