Elasticsearch users employ scoring to give a higher weight to documents that meet specific criteria. As we show with several examples in our previous article on scoring, the objective is often to get a list of documents with a sorting on the relevance to the search. Relevance is the numerical output of an algorithm that gives a measure of how a particular document is textually similar to the query. Elasticsearch employs and enhances standard scoring algorithms and encapsulates these within its script_score and function_score features.

This article is a continuation of our lengthy tutorial series on scripting in Elasticsearch, and it goes further than the scoring basics that we cover in the previously. Here, we explore advanced scoring techniques using decay functions. We provide a few permutations of a geographical search example, provide example code, and show the results.

In our previous article in this series, Elasticsearch Scripting: Scoring, we step through various types of simple scoring techniques that you can perform with scripts: boosting the score of matching documents, boosting the score according to the term frequency (recurrence), and computing score according to relationships among document field values.

Having gone through the basics of scoring through scripts, we now move on to advanced techniques. In this article, we cover operations that employ decay functions to compute scores by comparing one sliding scale with another. These functions are useful in a solving a number of complex scoring problems.

Modeling the Data

To support the examples below, we provide a document set containing price fields and geographical data (latitude and longitude).

We need to map the latitude and longitude fields in our document, ensuring that these fields will parse as geo_point type and not the default type that Elasticsearch would assign. We give our index the name parishotels. The type name is lists, and the field name containing location data is coordinates.

Here's how we map the geo_point type to the coordinates field :

curl -XPOST localhost:9200/parishotels -d '{
  "settings": {
    "number_of_shards": 1
  },
  "mappings": {
    "list": {
      "_source": {
        "enabled": true
      },
      "properties": {
        "coordinates": {
          "type": "geo_point",
          "index": "not_analyzed"
        }
      }
    }
  }
}'

Notice that we include the geographical coordinates of each hotel, the year it was established, and the availibility of typical hotel services.

Document 1:

curl -XPOST 'http://localhost:9200/parishotels/list/1' -d '{
 "name": "Alma",
 "established": 1858,
 "coordinates": "48.8445,2.2945",
 "rentPerDay": 300,
 "airportService": "no",
 "baggageService": "no",
 "rating" : 3.0
}'

Document 2:

curl -XPOST 'http://localhost:9200/parishotels/list/3' -d '{
 "name": "Eiffel Louvre",
 "established": 1910,
 "coordinates": "48.9702,2.2945",
 "rentPerDay": 500,
 "airportService": "no",
 "baggageService": "yes",
 "rating" : 4
}'


Document 3:

curl -XPOST 'http://localhost:9200/parishotels/list/2' -d '{
 "name": "La Petit",
 "established": 1901,
 "coordinates": "48.9102, 2.2945",
 "rentPerDay": 1000,
 "airportService": "yes",
 "baggageService": "yes",
 "rating" : 4.5
}'

Before proceeding, you can check to ensure proper indexing of these documents by performing a simple search with the following request. The results will list all of the documents in the parishotels index:

curl -XGET 'http://localhost:9200/parishotels/search?q=*&pretty'

Decay Functions

In the previous article, we present the basics of a function_score query. Now we'll cover other features of function_score known as decay functions, which compute a score according to distance from an origin. The score value decreases with an increase in distance, say, from a city center or a start date. The distance is calculated according to a single-value numeric field such as a geographical point, date, or any numeric field.

As parts of the function_score feature, there are three decay functions available to us: linear, exp (exponential), and gauss. Each of these decay functions takes four parameters as input:

  • origin — the center point, or initial value, that is the basis for the calculation.
  • scale — the rate of decay; governs how the score will decrease with the distance from the origin that a document lies.
  • offset — set this to a non-zero value to expand the origin to a range that varies from -offset to +offset. All values falling within the offset range get a score of 1.0.
  • decay — the score that a document will get for a given scale distance.


NOTE: The origin, scale, and offset values are mandatory.

Find more about the decay functions in the Elasticsearch docs.

Computing Scores According to Distance and Rating

Let's consider an example of a tourist searching for hotels near the Eiffel Tower in Paris. This visitor has a very specific itinerary and prefers to stay in a hotel location that is within a 2km radius of the iconic French monument. Should it be the case that no hotels are available in this small region, he is willing to increase the radius to 6km—an additional 4 km. If there are two or more hotels with the same distance from the center (origin), he would like to resolve the tie by examining the rating of the hotel.

Using this scenario, let's work out the origin, scale, decay, and offset values that we will then input into the decay function. The origin is Eiffel Tower, for which the global coordinates are 48.8582 latitude and 2.2945 longitude. Since the tourist has a radius in mind for his first preference, we can specify an offset. Any hotel location within a 2km radius is a definite consideration for the tourist, so we set the offset as "offset": "2 km". Any hotel having a location within this circle is given a maximum score of 1. A secondary preference for this tourist is for hotels within 6 km of the Eiffel tower. This is an additional 4km beyond the offset value, so we assign "scale": "4km".

If the score is computed and if any two of the hotels have the same or comparable score values, he wants to resolve it using the "rating" parameter. The product of the computed score and the rating is taken and a new score is generated. The following script would give us the required results:

curl -XGET 'http://localhost:9200/parishotels/list/_search?&pretty=true&size=3' -d '{
  "query": {
    "function_score": {
      "functions": [
        {
          "gauss": {
            "coordinates": {
              "origin": {
                "lat": 48.8582,
                "lon": 2.2945
                },
              "offset": "2km",
              "scale": "4km"
            }
          }
        },
        {
          "script_score": {
            "script": "_score * doc['rating'].value"
          }
        }
      ]
    }
  }
}'


After running this script, we see a listing of the hotels with a sort corresponding to the new score values. As shown below, the new _score values are generated by taking the product of value of the rating field and the _score that are computed by the decay function and the value in the rating field.


{
  "took": 825,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 3,
    "hits": [
      {
        "_index": "parishotels",
        "_type": "list",
        "_id": "1",
        "_score": 3,
        "_source": {
          "name": "Alma",
          "established": 1858,
          "coordinates": "48.8445,2.2945",
          "rentPerDay": 300,
          "airportService": "no",
          "baggageService": "no",
          "rating": 3
        }
      },
      {
        "_index": "parishotels",
        "_type": "list",
        "_id": "2",
        "_score": 2.4250412,
        "_source": {
          "name": "La Petit",
          "established": 1901,
          "coordinates": "48.9102, 2.2945",
          "rentPerDay": 1000,
          "airportService": "yes",
          "baggageService": "yes",
          "rating": 4.5
        }
      },
      {
        "_index": "parishotels",
        "_type": "list",
        "_id": "3",
        "_score": 0.035464928,
        "_source": {
          "name": "Eiffel Louvre",
          "established": 1910,
          "coordinates": "48.9702,2.2945",
          "rentPerDay": 500,
          "airportService": "no",
          "baggageService": "yes",
          "rating": 4
        }
      }
    ]
  }
}

Weighting by Customer Preference with the Gauss Decay Function

Here's another scenario: the tourist wants to retain all of the criteria above and also include the room rates. He wants to give more importance to the room rate than to the distance factor. How would you approach this additional constraint?

Remember that Elasticsearch offers a weight parameter, which we can also associate with decay functions. Let's configure our search such that the function with highest weight parameter gets preference in computing the score.

We learn that our tourist wants a room rate that falls within the range of 400 and 600 USD per day, and we find results that correspond to this additional criterion by performing the following search:

curl -XGET 'http://localhost:9200/parishotels/list/_search?&pretty=true&size=3' -d '{
  "query": {
    "function_score": {
      "functions": [
        {
          "gauss": {
            "coordinates": {
              "origin": {
                "lat": 48.8582,
                "lon": 2.2945
              },
              "offset": "2km",
              "scale": "4km"
            }
          }
        },
        {
          "gauss": {
            "rentPerDay": {
              "origin": "500",
              "offset": "100",
              "scale": "20"
            }
          },
          "weight": 2
        },
        {
          "script_score": {
            "script": "_score * doc['rating'].value"
          }
        }
      ]
    }
  }
}'

In the script above, you'll notice that we add the gauss function, which is applied on the field rentPerDay. Here, the origin is set to 500 and the offset value is 100. So, the range for the field ratePerDay, which gets the maximum score value, is:

500 - 100 <= 500 <= 500 + 100, or 400 <= 500 <= 600

Note that there is a "weight": 2 parameter that attaches to the second gauss function. This means that the gaussian clause rentPerDay has twice the weight of coordinates clause. Finally, the script will calculate the new score using script_score.

When we run this script and examine the results, we see that the hotels having rentPerDay field values that fall within the 400-600 range are shown first, and the score value has been multiplied by the rating of the hotels.

{
  "took": 22,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 0.070929855,
    "hits": [
      {
        "_index": "parishotels",
        "_type": "list",
        "_id": "3",
        "_score": 0.070929855,
        "_source": {
          "name": "Eiffel Louvre",
          "established": 1910,
          "coordinates": "48.9702,2.2945",
          "rentPerDay": 500,
          "airportService": "no",
          "baggageService": "yes",
          "rating": 4
        }
      },
      {
        "_index": "parishotels",
        "_type": "list",
        "_id": "1",
        "_score": 1.7881393e-7,
        "_source": {
          "name": "Alma",
          "established": 1858,
          "coordinates": "48.8445,2.2945",
          "rentPerDay": 300,
          "airportService": "no",
          "baggageService": "no",
          "rating": 3
        }
      },
      {
        "_index": "parishotels",
        "_type": "list",
        "_id": "2",
        "_score": 0,
        "_source": {
          "name": "La Petit",
          "established": 1901,
          "coordinates": "48.9102, 2.2945",
          "rentPerDay": 1000,
          "airportService": "yes",
          "baggageService": "yes",
          "rating": 4.5
        }
      }
    ]
  }
}

Conclusion

After exhibiting a number of advanced scoring techniques using the Elasticsearch decay functions, we bring this article to a close. We trust that this information has been helpful to you, and we welcome your feedback in the comments section below.