The script fields feature in Elasticsearch gives users the ability to return a script evaluation for each hit, according to the values taken from different fields. Script fields can work on temporary fields that won’t be stored, and they can return the final evaluation of the script as a custom value. Script fields can also access a source document from the index and extract specific elements.

This article provides a short tutorial on the use of script fields, and we also look at the basics of Elasticsearch logging during script execution.

Script_fields

This Elasticsearch feature gives users the ability to return a script evaluation for each hit, according to the values taken from different fields. Script fields can work on temporary fields that won’t be stored, and they can return the final evaluation of the script as a custom value. Script fields can also access a source document ( _source) from the index and extract specific elements.

Data Set

We keep our data simple for this short tutorial, using merely three products of the same category. Each document contains a description, price, and rating.

Document 1

curl -XPOST 'http://localhost:9200/products/bed/1' -d '{
  "brand": "Francio",
  "category": "Bed",
  "description": "King size with box spring. Mahagony finish, and polished red trim; plywood base.",
  "price": 200,
  "rating": 3
}'

Document 2

curl -XPOST 'http://localhost:9200/products/bed/2' -d '{
  "brand": "Ikea",
  "category": "Bed",
  "description": "Queen size. Teak wood, black in color.",
  "price": 400,
  "rating": 4
}'

Document 3

curl -XPOST 'http://localhost:9200/products/bed/3' -d '{
  "brand": "Tonelli",
  "category": "Bed",
  "description": "Queen size. Zebrano, brown in color ",
  "price": 300,
  "rating": 4
}'

Suppose that we need to search our data set and find items having “red” or “black” in the description. We also want to compute a score on the price and rating fields for each document. We’ll divide the rating by the price to get this score. Since we only need this score for this task, we will only define it for the execution of the script (my_score) and won’t bother storing it in the document.

Here is our query:

curl -XGET 'http://localhost:9200/products/bed/_search?&pretty=true&size=3' -d '{
  "filter": {
    "terms": {
      "description": [
        "red",
        "black"
      ]
    }
  },
  "fields": [
    "_source"
  ],
  "script_fields": {
    "my_score": {
      "script": "doc['price'].value / doc['rating'].value"
    }
  }
}'

Stepping through the query above, we see that it begins by filtering the documents according to the occurrence of the terms “red” and “black” in the field “description.” The second part—beneath "fields"—will display the fields that we specify in the filter terms. By specifying _source, we expect to get the values for the description under the source.

Looking closely at the script_fields section, we see the custom field my_score, into which we place the score calculation—which is the price divided by the rating fields for each document.

After running the query, we find in the results a fields entry that corresponds to each hit and contains the corresponding my_score calculated value:

...
"hits":{  
"total":2,
"max_score":1,
"hits":[  
{  
"_index":"products",
"_type":"bed",
"_id":"2 -d",
"_score":1,
"_source":{  
"brand":"Ikea",
"category":"Bed",
"description":"Queen size. Teak wood, black in colorr","price":400,
"rating":4
},
"fields":{  
"my_score":[  
"100"
]
}
},
{  
"_index":"products",
"_type":"bed",
"_id":"1 -d",
"_score":1,
"_source":{  
"brand":"Francio",
"category":"Bed",
"description":"King size with box spring. Mahagony finish, and polished red trim; plywood base."   
               "price":200,
"rating":3
},
"fields":{  
"my_score":[  
"66.6666666667"
]
}
}
]
}

NOTE: When using script fields, the default response will not contain the _source field but only the custom field we have defined. That is precisely why we specify _source in the fields section of the query. You can see it yourself by omitting the fields and looking at the default results.

Accessing data using script fields

We can access data in two ways using script fields. The first method is to use doc['fieldName'].value; the second is the _source.fieldName method. Let’s compare these. When we use the doc method, the terms corresponding to that field will load into the cache memory. This increases execution speed but consumes more memory. Moreover, when we use the doc notation, we cannot access a field in object form (as we can with _source.field1.field2).

When we use _source to access the fields, we get parsed results so that we can directly and specifically what we need.

One more thing to consider: with script fields, we can retrieve some fields that are not_analyzed, which gives us the ability to retrieve the original and partial data (in cases where the analyzer has a filter which alters or removes tokens). This approach both conserves memory and enhances performance.

Logging from a Script

Logging is a very important feature in Elasticsearch, especially for debugging when engaging with large data sets. You can insert entries into logs at very specifics points in the execution of queries or document updates. Of course, you also benefit from archiving the log data for future reference or analysis. Let’s look at how to create logs in Elasticsearch.

As you may recall from our introductory article, there are two ways to do scripting: either with the in-request method or by using scripts in the configs folder. We cover the first method in that article, and here we explore the second method. Elasticsearch lets you save your custom scripts in the scripts folder. Generally this folder is found in the path /etc/elasticsearch. If there is no folder having the name scripts in that path, then you can simply create one of the same name.

Open your terminal, go to the scripts folder, and then create a file having the name loggingTest.groovy with admin privileges. Here in this file, we’ll write our code for logging. Write the following code into the groovy file:

import org.elasticsearch.common.logging.*;
ESLogger logger=ESLoggerFactory.getLogger('first log created');
logger.info(doc['brand'].value);
return doc['brand'].value ;

This code in this loggingTest groovy script does the following:

  • imports the logging module from the Elasticsearch library
  • passes an identifier message for the log. In this example it is “first log created”. Note that this identifier log message can be any string of your choosing, but we do this here to easily distinguish the logs that we need.
  • specifies the field value we want to see in the log—the brand field from our data set.
  • Lastly, we return the value of the brand field.

Next, we run a simple terms aggregation that executes the loggingTest script:

curl -XGET 'http://localhost:9200/products/bed/_search?&pretty=true&size=3' -d '{
  "aggs": {
    "callingLogs": {
      "terms": {
        "script": "loggingTest"
      }
    }
  }
}'

When we execute the request given above, the script will create new entries in the log. To see these entries, we can navigate to the directory /var/log/elasticsearch. In this directory, there will be a file having the name elasticsearch.log. Open it and navigate to the very bottom of the file to see the new entries:

[2015-09-02 08:11:25,517][INFO ][first log created] tonelli
[2015-09-02 08:11:25,517][INFO ][first log created] francio
[2015-09-02 08:11:25,519][INFO ][first log created] ikea

Here in each of these entries we see the timestamp, the identifier message inserted by our loggingTest script, and the field values. And that brings this article to a close. We welcome your comments below.