In the previous article, we covered “painless” and provided details about its syntax and its usage. It also covered some best practices, like why to use params, when to use “doc” values versus  “_source” when accessing the document fields, and how to create fields on the fly, etc. 

In this article, we explore further usages of “painless” scripting. This article covers using painless scripting in a query context, filter context, using conditionals in scripting, deleting fields/nested fields, accessing nested objects, usage of scripting in scoring, and more.

Previous Painless Posts

Tutorial

To start, let’s use a data set that we can use for rest of this post:

curl -XPUT 'localhost:9200/tweets/tweet/1' -H 'Content-Type: application/json' -d '
{"username":"tom","posted_date":"2017/07/25" ,"message": "I brought apple stock at the best price" ,"tags": ["stock","money"] , "info":{"device":"mobile", "os": "ios"}, "likes": 10}'
curl -XPUT 'localhost:9200/tweets/tweet/2' -H 'Content-Type: application/json' -d '
{"username":"mary","posted_date":"2017/06/25" ,"message": "Machine learning is the future" ,"tags": ["ai","tech"] , "info":{"device":"desktop", "os": "ios"}, "likes": 100}'
curl -XPUT 'localhost:9200/tweets/tweet/3' -H 'Content-Type: application/json' -d '
{"username":"tom","posted_date":"2017/07/27" ,"message": "just tweeting" ,"tags": ["confused"] , "info":{"device":"mobile", "os": "win"}, "likes": 0}'
curl -XPUT 'localhost:9200/tweets/tweet/4' -H 'Content-Type: application/json' -d '
{"username":"mary","posted_date":"2017/07/28" ,"message": "exploring painless" ,"tags": ["elastic"] , "info":{"device":"mobile", "os": "linux"}, "likes": 100}'
curl -XPUT 'localhost:9200/tweets/tweet/5' -H 'Content-Type: application/json' -d '
{"username":"mary","posted_date":"2017/05/20" ,"message": "painless is fun but its a new scripting language in the town" ,"tags": ["elastic","painless","scripting"] , "info":{"device":"mobile", "os": "linux"}, "likes": 1000}'

Script Query

Script query allows us to execute a script on each document. Script queries are usually used in filter context. When you want to have a script in the query or filter context, make sure you embed the script in a script object ( “script” :{}). Hence in the below example you would see script tag within a script tag.  

Let’s try it out with an example. Let’s find out all the tweets that contains string “painless” and has a length greater than 25 characters.

curl -X POST   'http://localhost:9200/tweets/tweet/_search' -H 'Content-Type: application/json'   -d '{  
  "query":{  
     "bool":{  
        "must":[  
           {  
              "match":{  
                 "message":"painless"
              }
           }
        ],
        "filter":[  
           {  
              "script":{  
                 "script":{  
                    "inline":"doc['\''message.keyword'\''].value.length() > params.length",
                    "params":{  
                       "length":25
                    }
                 }
              }
           }
        ]
     }
  }
}'

Response

{
   "took": 13,
   "timed_out": false,
   "_shards": {
       "total": 5,
       "successful": 5,
       "failed": 0
   },
   "hits": {
       "total": 1,
       "max_score": 0.25316024,
       "hits": [
           {
               "_index": "tweets",
               "_type": "tweet",
               "_id": "5",
               "_score": 0.25316024,
               "_source": {
                   "username": "mary",
                   "posted_date": "2017/05/20",
                   "message": "painless is fun but its a new scripting language in the town",
                   "tags": [
                       "elastic",
                       "painless",
                       "scripting"
                   ],
                   "info": {
                       "device": "mobile",
                       "os": "linux"
                   },
                   "likes": 1000
               }
           }
       ]
   }
}

Scripts in Aggregations

Scripts can also be used in aggregations. For aggregations, we typically use the values from the fields (not analyzed fields) to perform aggregations. Using scripts, one can extract values from existing fields, append values from multiple fields, and then carry out aggregations on the newly derived value. 

In the above tweets used as an example, we have just the “posted_date” information. What if we wanted to find out number of tweets per month? Below is an example which shows the use of scripts in aggregations.:

curl -X POST   'http://localhost:9200/tweets/tweet/_search' -H 'Content-Type: application/json'   -d '{  
  "size":0,
  "aggs":{  
     "my_terms_agg":{  
        "terms":{  
           "script":{  
              "inline":"doc['\''posted_date'\''].date.monthOfYear"
           }
        }
     }
  }
}'

Response

{
   "took": 6,
   "timed_out": false,
   "_shards": {
       "total": 5,
       "successful": 5,
       "failed": 0
   },
   "hits": {
       "total": 5,
       "max_score": 0,
       "hits": []
   },
   "aggregations": {
       "my_terms_agg": {
           "doc_count_error_upper_bound": 0,
           "sum_other_doc_count": 0,
           "buckets": [
               {
                   "key": "7",
                   "doc_count": 3
               },
               {
                   "key": "5",
                   "doc_count": 1
               },
               {
                   "key": "6",
                   "doc_count": 1
               }
           ]
       }
   }
}

Deleting a Field Using Scripts

In the part 1 of this series, we showed how to update a document using scripts. Similarly, we can remove a field/nested field using the scripts. All you have to do is to use the remove method and pass in the field/nested field name. For example, let’s say we want to delete the nested field “device” for the document with ID 5.

curl -X POST   'http://localhost:9200/tweets/tweet/5/_update -H 'Content-Type: application/json'   -d '{  
  "script":{  
     "inline":"ctx._source.info.remove(params.fieldname)",
     "params":{  
        "fieldname":"device"
     }
  }
} '

Response

{
   "_index": "tweets",
   "_type": "tweet",
   "_id": "5",
   "_version": 2,
   "result": "updated",
   "_shards": {
       "total": 2,
       "successful": 1,
       "failed": 0
   }
}

As the field device” is present within the parent field “info”, we need to invoke the remove method on the parent field name “info”. If one wanted to remove the parent field itself, then we can call the remove method on the “_source” itself as it is represents the parent in this case. Let’s fetch the document and verify:

curl -X GET   'http://localhost:9200/tweets/tweet/5’

Response

{
   "_index": "tweets",
   "_type": "tweet",
   "_id": "5",
   "_version": 2,
   "found": true,
   "_source": {
       "username": "mary",
       "posted_date": "2017/05/20",
       "message": "painless is fun but its a new scripting language in the town",
       "tags": [
           "elastic",
           "painless",
           "scripting"
       ],
       "info": {
           "os": "linux"
       },
       "likes": 1000
   }
}

Scripts in Custom Scoring

When we execute a match query, elasticsearch returns the matching results, and also calculates scores for each matching documenting showcasing how well a document matches for the given query.  While the default algorithm BM25 does this scoring/relevancy well, there are times where the question of relevance must be answered via other algorithms, or augmented with other scoring heuristics. It is here that Elasticsearch’s script_score and function_score features become useful. 

Interested in Kubernetes? Supergiant Kubernetes development support provides architectural consulting from experts, integrations, readiness assessments, implementation, and training to ensure that your journey to containerization and microservices is reliable and complete.

Let’s say we want to search for a text “painless” but want to show tweets with more likes at the top of the search results. It’s more like popular tweets/trending tweets at the top. Let’s see it in action.

curl -X POST   POST   'http://localhost:9200/tweets/tweet/_search' -H 'Content-Type: application/json'   -d ' {
   "query": {
       "function_score": {
           "query": {
               "match": { "message": "painless" }
           },
           "script_score" : {
               "script" : {
                 "inline": "Math.log(1 + doc['likes'].value)"
               }
           }
       }
   }
}'

Response

{
   "took": 10,
   "timed_out": false,
   "_shards": {
       "total": 5,
       "successful": 5,
       "failed": 0
   },
   "hits": {
       "total": 2,
       "max_score": 6.908755,
       "hits": [
           {
               "_index": "tweets",
               "_type": "tweet",
               "_id": "5",
               "_score": 6.908755,
               "_source": {
                   "username": "mary",
                   "posted_date": "2017/05/20",
                   "message": "painless is fun but its a new scripting language in the town",
                   "tags": [
                       "elastic",
                       "painless",
                       "scripting"
                   ],
                   "info": {
                       "os": "linux"
                   },
                   "likes": 1000
               }
           },
           {
               "_index": "tweets",
               "_type": "tweet",
               "_id": "4",
               "_score": 4.6151204,
               "_source": {
                   "username": "mary",
                   "posted_date": "2017/07/28",
                   "message": "exploring painless",
                   "tags": [
                       "elastic"
                   ],
                   "info": {
                       "device": "mobile",
                       "os": "linux"
                   },
                   "likes": 100
               }
           }
       ]
   }
}

In the above example, if we had done a normal query without creating a custom score, because of TF/IDF, document 4 would have been at the top, aka the document score would have been higher than document 5.

Scripts can be made of a single statement or made of multiple statements separated by semi colon(;).  If you have scripts that are made up of multiple statements, rather than placing all the statements in a single line, you can make them a multi-line script by placing different statements in different lines. But make sure you embed those multiline scripts in triple quotes(""") [example:  " " "    <your multiline script here> " " "] else query parsing fails.

Conclusion

In this article we have went further ahead in depth about “painless” and have explored the advance usages of “painless” scripting. In the final part of the series, let’s explore how we can use painless in Kibana. Stay tuned.

Give it a Whirl!

It's easy to spin up a standard hosted Elasticsearch cluster on any of our 47 Rackspace, Softlayer, or Amazon data centers. And you can now provision your own AWS Credits on Qbox Private Hosted Elasticsearch

Questions? Drop us a note, and we'll get you a prompt response.

Not yet enjoying the benefits of a hosted ELK-stack enterprise search on Qbox? We invite you to create an account today and discover how easy it is to manage and scale your Elasticsearch environment in our cloud hosting service.

comments powered by Disqus