As some of our readers know, we’ve been working through a tutorial series on Elasticsearch scripting. We’ve seen the power and critical importance of scripting, a number of basic examples, and various types of sorting that you can perform in ES. Here in this article, we concentrate on applying filters through query scripts because you’ll get results more quickly when you use filters instead of queries.

In Elasticsearch, filters have similar importance to querying features. We make this assertion for the simple fact that you can use filters in nearly all the operations done using queries. But there’s a bonus. In most cases, you get faster results with filters. You can also apply filters to gain more relevance by reducing the number of search documents for a query. In the sections below, we cover a number of basic filtering operations in Elasticsearch.

Modeling the Data

In the previous articles of this series, we’ve been using examples of student academic records. Here, we are exploring filters—which have other features that we want to examine. We’ll therefore use another type of example data with an entirely different structure. To start this tutorial, we present three sample documents containing details of music festivals from different locations of the world. Each one is rready for indexing.

Document 1

curl -XPOST 'http://localhost:9200/festivals/music/1' -d '{
       "festival": "Tomorrowland",
       "venue": "Belgium",
       "charges": {
         "entryCharge": 2000,
         "accomodationCharge": 1000
       },
       "dates": {
         "startDate": "2014-05-01T00:00:00.000Z",
         "endDate": "2014-12-03T00:00:00.000Z"
       },
       "reviews": 
        [
         "excellent",
         "good",
         "excellent",
         "awesome",
         "excellent"
        ]
}'

Document 2

curl -XPOST 'http://localhost:9200/festivals/music/2' -d '{ 
    "festival":"Airwaves", 
    "venue":"Iceland", 
    "charges":{ 
        "entryCharge":1000, 
        "accomodationCharge":1000 
    }, 
    "dates":{ 
        "startDate":"2014-11-04T00:00:00.000Z", 
        "endDate":"2014-11-08T00:00:00.000Z" 
    }, 
    "reviews":[ 
        "good", 
        "good", 
        "excellent", 
        "good", 
        "excellent" 
    ], 
    "firstDay":[ 
        "avici", 
        "armin", 
        "dash" 
    ] 
}

Document 3

curl -XPOST 'http://localhost:9200/festivals/music/3' -d '{ 
    "festival":"Glastonbury", 
    "venue":"England", 
    "charges":{ 
        "entryCharge":3000, 
        "accomodationCharge":1000 
    }, 
    "dates":{ 
        "startDate":"2014-06-25T00:00:00.000Z", 
        "endDate":"2014-06-29T00:00:00.000Z" 
    }, 
    "reviews":[ 
        "average", 
        "good", 
        "average", 
        "good", 
        "excellent" 
    ] 
}'

You can see that each document contains a festival name, venue (location), charges for the event, start and end dates, and review indications from 5 users. Next, we’ll run scripts to apply various filters as we search these documents.

Filter against a Threshold Value

In the previous article, we gave some examples of ascending or descending sorts on field values and got results in the order we expect. Another common task is to find documents in which values in a specific field are greater than, lesserthan, or equal to a threshold value. And you can get it done quickly with filters.

In the documents above, we have the field entryCharge. Let’s say that we need to find a festival whose entryCharge is less than 1,500 (could be dollars, euros, or another currency). We can accomplish this by running the following script:

curl -XGET 'http://localhost:9200/festivals/music/_search?&pretty=true&size=3' -d '{
"query": {
   "filtered": {
     "filter": {
       "script": {
         "script": "doc[\"entryCharge\"].value <cutoff",
         "params" : {
           "cutoff" : 1500
         }
       }
     },
     "query": {
       "match_all": {}
     }
   }
}'

After running this at the terminal, you’ll see that the results consist of only those festivals whose entry charge is less than a value of 1,500.

Finding Fields that Have Equal Values

As we examine the start and end dates, we see that festivals vary in duration. Some extend up to four days; others are one-day events. Suppose now that we need to filter out the one-day festivals. How do we accomplish this? Thinking carefully, we realize that one-day festivals have the same start date and end date, so we can filter out the festivals that have the same value for both the startDate and endDate. We could run a script such as this one:

curl -XGET 'http://localhost:9200/festivals/music/_search?&pretty=true&size=3' -d '{
  "query": {
    "filtered": {
      "filter": {
        "script": {
          "script": "doc[\"startDate\"].value == doc[\"endDate\"].value"
        }
      }
    }
  }
}'

Filter According to a Count of Matching Values

In each of the documents you’ll find the reviews array, which contains participant review information. Let’s say that we need to find which festivals have the highest rating of “excellent.” Maybe our boss tells us that we should classify the festival into our highest rating category if the term “excellent” occurs more than twice in the reviews array. We can implement this filtering logic by running the following script:

curl -XGET 'http://localhost:9200/festivals/music/_search?&pretty=true&size=3' -d '{
  "query": {
    "filtered": {
      "filter": {
        "script": {
          "script": "_index['reviews']['excellent'].tf() > 2"
        }
      }
    }
  }
}'


Filter According to an Array Index

Let’s go further to see how to apply filters to array operations. We’ll again look at the reviews array that we mention in the previous section. We could look at all reviews given by the a specific reviewer. Let’s focus on the reviewer in the first element of the array. In our example, this would be the element reviews[0]. Now, if we need to know how many festivals this reviewer has given a rating of “excellent,” what would we do? One approach is to examine the first element in the reviews array for a match on the string “excellent,” which we could do with the following script:

curl -XGET 'http://localhost:9200/festivals/music/_search?&pretty=true&size=3' -d '{
  "query": {
    "filtered": {
      "filter": {
        "script": {
          "script": "for(pos :  _index['reviews'].get('excellent',_POSITIONS)){ if(pos.position == 0) { return true}};return false;"
        }
      }
    }
  }
}'

Since we know you’ve got many demands on your time, we’ll stop here. In this article, we’ve seen how to implement filters using scripts for various operations, including filtering according to threshold value, comparing two fields, filtering according to counts on specific values, and filtering by an array index. In the next article, we’ll look at scripts for various scoring methods in Elasticsearch.

Please let us know how this article has been helpful to you. We welcome comments and any questions using the links below. We invite you to read more here: