Elasticsearch, by default, return the results sorted by relevance with the most relevant docs first. In order to sort by relevance, we need to represent relevance as a value. The relevance score of each document is represented by a positive floating-point number called the _score. The higher the _score, the more relevant the document.

For this post, we will be using hosted Elasticsearch on Qbox.io. You can sign up or launch your cluster here, or click "Get Started" in the header navigation. If you need help setting up, refer to "Provisioning a Qbox Elasticsearch Cluster."

A query clause generates a _score for each document. How that score is calculated depends on the type of query clause. Different query clauses are used for different purposes: a fuzzy query might determine the _score by calculating how similar the spelling of the found word is to the original search term; a terms query would incorporate the percentage of terms that were found. However, what we usually mean by relevance is the algorithm that we use to calculate how similar the contents of a full-text field are to a full-text query string.

The standard similarity algorithm used in Elasticsearch is known as term frequency/inverse document frequency, or TF/IDF.

Sorting within Nested Objects

Elasticsearch also supports sorting by fields that are inside one or more nested objects. The sorting by nested field support has the following parameters on top of the already existing sort options:

  • nested_path - Defines on which nested object to sort. The actual sort field must be a direct field inside this nested object. When sorting by nested field, this field is mandatory.

  • nested_filter - A filter that the inner objects inside the nested path should match with in order for its field values to be taken into account by sorting. Common case is to repeat the query / filter inside the nested filter or query. By default no nested_filter is active.

Lets consider the previous index mapping:

curl -XPUT 'ES_HOST:ES_PORT/blogs?pretty' -H 'Content-Type: application/json' -d '{
 "mappings": {
   "series": {
     "properties": {
       "title": { "type": "string" },
       "comments": {
         "type": "nested",
         "properties": {
           "name":    { "type": "string"  },
           "comment": { "type": "string"  },
           "age":     { "type": "short"   },
           "rating":   { "type": "short"  },
           "date":    { "type": "date"    }
         }
       }
     }
   }
 }
}'

If we wish to search for blogs with title “qbox” and sorted by average age of viewers in ascending order such that the viewer has rated the blog with a rating of 5, we can query as follows:

curl -XPOST 'ES_HOST:ES_PORT/blogs/_search?pretty' -H 'Content-Type: application/json' -d '{
  "query" : {
     "term" : { "title" : "qbox" }
  },
  "sort" : [
      {
         "comments.age" : {
            "mode" :  "avg",
            "order" : "asc",
            "nested_path" : "comments",
            "nested_filter" : {
               "term" : { "comments.rating" : 5 }
            }
         }
      }
   ]
}
'

Nested Sorting

In the below example offer is a field of type nested. The nested_path needs to be specified; otherwise, elasticsearch doesn’t know on what nested level sort values need to be captured.

Sort Order

The order option can have the following values:

  • asc - Sort in ascending order

  • desc - Sort in descending order

The order defaults to desc when sorting on the _score, and defaults to asc when sorting on anything else.

Sort Mode Option

Elasticsearch supports sorting by array or multi-valued fields. The mode option controls what array value is picked for sorting the document it belongs to. The mode option can have the following values:

  • min - Pick the lowest value.

  • max - Pick the highest value.

  • sum - Use the sum of all values as sort value. Only applicable for number based array fields.

  • avg - Use the average of all values as sort value. Only applicable for number based array fields.

  • median - Use the median of all values as sort value. Only applicable for number based array fields.

It is possible to sort by the value of a nested field, even though the value exists in a separate nested document. To make the result more interesting, we will add another record:

curl -XPUT 'ES_HOST:ES_PORT/blogs/series/1?pretty' -H 'Content-Type: application/json' -d '{
  "title": "Qbox: ElasticSearch Cloud Provider",
  "body": "Set up your elasticsearch cloud in a few minutes… ",
  "tags":  [ "elasticsearch", "qbox", "search"],
  "comments": [
    {
      "name":   " Adam Vanderbush",
      "comment": "Hassle free cloud solution",
      "age":     32,
      "stars":   4,
      "date":    "2017-05-12"
    },
    {
      "name":    "Brian Sage",
      "comment": "Works out of the box",
      "age":     28,
      "stars":   5,
      "date":    "2017-05-15"
    }
  ]
}'

Consider that we want to retrieve blog posts that received comments in May, ordered by the lowest number of stars that each blog post received. The search request would look like this:

curl -XGET 'ES_HOST:ES_PORT/blogs/_search?pretty' -H 'Content-Type: application/json' -d '{
  "query": {
    "nested": { 
      "path": "comments",
      "filter": {
        "range": {
          "comments.date": {
            "gte": "2017-05-01",
            "lt":  "2017-06-01"
          }
        }
      }
    }
  },
  "sort": {
    "comments.rating": {
      "order": "asc",   
      "mode":  "min",   
      "nested_filter": { 
        "range": {
          "comments.date": {
            "gte": "2017-05-01",
            "lt":  "2017-06-01"
          }
        }
      }
    }
  }
}'

The nested query limits the results to blog posts that received a comment in May.

Results are sorted in ascending (asc) order by the lowest value (min) in the comment.rating field in any matching comments.

The nested_filter in the sort clause is the same as the nested query in the main query clause. The reason is explained next.

Sorting happens after the query has been executed. The query matches blog posts that received comments in May, but it returns blog post documents as the result. If we didn’t include the nested_filter clause, we would end up sorting based on any comments that the blog post has ever received, not just those received in May. Thus, we need to repeat the query conditions in the nested_filter.

Give it a Whirl!

It's easy to spin up a standard hosted Elasticsearch cluster on any of our 47 Rackspace, Softlayer, or Amazon data centers. And you can now provision your own AWS Credits on Qbox Private Hosted Elasticsearch

Questions? Drop us a note, and we'll get you a prompt response.

Not yet enjoying the benefits of a hosted ELK-stack enterprise search on Qbox? We invite you to create an account today and discover how easy it is to manage and scale your Elasticsearch environment in our cloud hosting service.

comments powered by Disqus