In our previous article on the scripting features of Elasticsearch, our readers saw how easy it is to perform basic operations through scripting—adding new fields to an index, managing arrays, and removing fields.

This is the second article in our series and is about various types of sorting that you can perform through Elasticsearch scripting. These operations are a bit more complex, and we provide a number of in-depth examples.

We cover a variety of sorting examples in this article:

  • Simple sort according to a single field
  • Sorting on fields containing numbers as substrings
  • Sorting on string values
  • Sorting on the word count for a field

Continue reading below for explanations and examples of each type of sorting.

Simple Sort According to a Single Field

Let’s begin with a simple sorting example. Into the same index (testindex), we’ll load three documents containing academic details for students in a class. We add the first document by entering this command in the terminal.

Document 1

<pre">curl -XPOST 'http://localhost:9200/testindex/testindex/1' -d '{
 "personalDetails": {
 "name": "Bob",
 "age": "13",
 "rollNumber": "02 VIIA"
 },
 "marks": {
 "physics": 48,
 "maths": 45,
 "chemistry": 44
 },
 "remarks": 
     [
     "hardworking"
     ],
     "comments": 
         [
         "Hard working, with a great interest in sports. Especially soccer."
         ]
}'</pre">

Now, so that we have a complete document set that we can sort, let’s add two more documents into that same index.

Document 2

<pre">curl -XPOST 'http://localhost:9200/testindex/testindex/2' -d '{
 "personalDetails": {
 "name": "Tom",
 "age": "14",
 "rollNumber": "23 VIIA"
 },
 "marks": {
 "physics": 47,
 "maths": 46,
 "chemistry": 33
 },
 "remarks": 
    [
     "intelligent"
    ],
     "comments": 
        [    
         "a very talented boy,with a promising future"
        ]
}'</pre">

Document 3:

<pre">curl -XPOST 'http://localhost:9200/testindex/testindex/3' -d '{
  "personalDetails": {
    "name": "Ron",
    "age": "15",
    "rollNumber": "17 VIIA"
  },
  "marks": {
    "physics": 34,
    "maths": 48,
    "chemistry": 43
  },
  "remarks": 
      [
        "disciplined"
      ],    
      "comments": 
        [
        "a perfect disciple and a very calm boy"
        ]
}'</pre">

To sort the marks for all students in the physics subject, in ascending order, we can use a script like the one here:

<pre">curl -POST 'http://localhost:9200/testindex/testindex/_search?&pretty=true&size=5' -d '{
  "query": {
    "match_all": {}
  },
  "sort": {
    "_script": {
      "script": "doc[\"marks.physics\"].value",
      "lang": "groovy",
      "type": "number",
      "order": "asc"
    }
  }
}'</pre">

Executing this query will return all the documents in the index, sorting the values of the desired field (marks.physics) and displaying them according to an ascending (asc) sort. Note the following:

  • The query employs the match_all command.
  • doc[\”marks.physics\”]”.value returns the value of the physics field in the document.
  • lang indicates the language in which we are writing the script. In our case, it’s Groovy.
  • type specifies the data type for converting the values corresponding to the field (marks.physics) associated with the field.
  • params is an optional parameter, which would be helpful if we want to do conversions on the field values.
  • order can be either ascending (asc) or descending (desc), which determines the sorting type.

Sorting on Fields Containing Numbers as Substrings

In the first example given above, we saw how to do a simple sort in Elasticsearch with a script. The sorting was done on the physics field, and the values in that field were given as integers—not strings. Suppose now that we want to sort the values according to the roll numbers of the students. That is, we want to get the data in the roll number order.

In the script given above, we can substitute doc[\”physics\”].value * multiplier with doc[\”rollNumber\”].value * multiplier. But, we’ll get an error in return, since Elasticsearch won’t recognize the integers. Let’s revise the script to achieve a sort according to rollNumber :

<pre">curl -XGET 'http://localhost:9200/testindex/testindex/_search?&pretty=true&size=5' -d '{
  "sort": [
    {
      "_script": {
        "script": "try { Integer.parseInt(doc[\"personalDetails.rollNumber\"].value); } catch(Exception e){ return Integer.MAX_VALUE;}",
        "type": "number",
        "order": "asc",
        "lang": "groovy"
      }
    }
  ]
}'</pre">

Since our rollNumber field has numbers that occupy the first position in the string, we apply the integer.parseInt function to render the values as integers.

When we enter this script above at the terminal, we’ll get a response listing the according to ascending roll numbers.

Sorting on String Values

Next, let’s do a sort on specific keys that are already present in a document. Look again at the first document for the student having the name “Bob:”

{
  "personalDetails": {
    "name": "Bob",
    "age": "13",
    "rollNumber": "02 VIIA"
  },
  "marks": {
    "physics": 48,
    "maths": 45,
    "chemistry": 44
  },  "remarks": [
    "hardworking"
  ],
  "comments": 
    [
        "Hard working, with a great interest in sports. Especially soccer."
      ]
}

The comments for Bob include the term “hardworking.” Likewise, the other student’s comments contain “intelligent” and “disciplined.” Suppose our supervisor asks for a sort on the comments field, such that students having “hardworking” in their comments get the highest ranking. How would you accomplish this task, since there are no integer values given for the comments field?

The solution can be provided by scripting the sort operation in a specific way, such that we set each remark field a value. That is, since our primary importance is to be assigned for “hardworking,” we assign it a value 0 in the factors script parameter. Similarly, we assign a value of 1 to “intelligent” and a value of 2 to “disciplined,” as shown below:

<pre">curl -XGET 'http://localhost:9200/testindex/testindex/_search?&pretty=true&size=5' -d '{
  "sort": {
    "_script": {
      "script": "factor.get(doc[\"remarks\"].value)",
      "type": "number",
      "params": {
        "factor": {
          "hardworking": 0,
          "intelligent": 1,
          "disciplined": 2
        }
      },
      "order": "asc"
    }
  }
}'</pre">

When you run this script at the terminal, the results will appear with Bob’s document at the top—the one containing the “hardworking” value in the comment field.

Sorting on the Word Count for a Field

Another common requirement is the need to sort according to the word count in a specific field.

For our fourth example here, let’s again focus on the comments field in documents given above. There is no direct method to accomplish this task, so we first need to tokenize the values in the comments field. Tokenization of strings can be done in a variety of ways, and the simplest one is to break the string into individual words.

We can also tokenize by omitting specific word(s). We use the appropriate analyzer module for each type of tokenization. The type of analyze modules which we intend to use is specified in the analyzer field of the mapping code. Here, we’ll use the standard analyzer, which tokenizes the string into individual words. You can read more about it here.

To define a mapping on an index, we need to do so before including documents in it. Since we have already have an index, we need to delete it and re-index again. Simply run this command to delete the index:

curl -XDELETE "http://hostname:9200/testindex"

Now, create the new index by typing in the following command:

curl -X PUT "http://hostname:9200/testindex"

Then, we define the mapping such that we tokenize the field comments, like this:

curl -X PUT "http://hostname:9200/testindex/comments/_mapping" -d 
'{  "comments":
      {    "properties":
         {      "comments": 
           {        "type": "string", "fields": 
             {"word_count": 
               {"type": "token_count",
                "analyzer": "standard"
          }
        }
      }
    }
  }
}'

Now, we need to add the data documents to this new index, as we did for each of the documents in the first example near the top of this article.

Finally, type the following to perform a sorting of the documents according to the token count (which means the word count in our context here) in the comments field:

curl -XPOST 'http://localhost:9200/testindex/_search?pretty' -d '
{  "sort": {
    "text.word_count": {
      "order": "desc"    }
  }
}'

The results will list the documents according to the word count in the comment field.

Conclusion

In this article, we’ve seen how to use the sort operation in a number of different scenarios. An equally important feature is the filtering operation, which we will cover in the next article in our Elasticsearch scripting series.