Welcome to Episode #2 of our Elasticsearch tutorial series. Today we’ll explore some of the Query DSL of Elasticsearch, including an example of how to implement features of Elasticsearch into your application.

We will begin this episode by replaying a bit of Episode #1 of this tutorial series. Let’s first install Elasticsearch and start a single node cluster, and then we’ll change some default settings and index some new documents.

Install Elasticsearch

http://elasticsearch.org/download/

If you don’t yet have Elasticsearch on your machine, we’ll start by downloading it, using 1.1.0 since it is the current stable release. Breaking version changes of Elasticsearch is very well documented, so if you’re having any trouble with this tutorial, feel free to add a comment down below. Or, if you have a new cluster on Qbox, feel free to contact support with your questions.

After you have Elasticsearch on your machine and have made the changes mentioned in Episode #1, we will generate some queries in the terminal to get you acquainted with the responses they will offer.

Github Repo

https://github.com/StackSearchInc/qbox-elasticsearch-tutorial/tree/episode-2

Download the github repository lined above and open up the sports data file. You’ll notice we’ve included a completely new set of documents. Each document of the index sports is an athlete. Every athlete has a name, birthdate, sport, rating, and location.

mapping-visual.png#asset:262

In the readme.md file you’ll find the mapping for these documents and the bulk index command to get the bulk sports data file into the sports index.

curl -XPUT 'localhost:9200/sports/athlete/_mapping' -d '{
  "athlete": {
    "properties": {
      "birthdate": {
        "type": "date",
          "format": "dateOptionalTime"
      },
      "location": {
        "type": "geo_point"
      },
      "name": {
        "type": "string"
      },
      "rating": {
        "type": "integer"
      },
      "sport": {
        "type": "string",
        "index": "not_analyzed"
      }
    }
  }
}'

 

After you’ve indexed these documents using the bulk index command mentioned in the sports-mapping file, we will set up a match query. The beginning of our match query is similar to most searches in Elasticsearch. You start with specifying the indices and types you’re querying. In this case it will be the sports index, and the athlete type of documents.

curl -XGET 'localhost:9200/sports/athlete/_search?pretty'

The request above has the _search endpoint specified. You may have noticed in the sports-mapping file we used _mapping. Elasticsearch features several APIs for granular customization to your clusters, nodes, indices, documents mapping, and requests. However, with several APIs available, it can be difficult to understand the correct API for your use case.

Query structure is comprehensive and intuitive in Elasticsearch, and we’ll use the simple-to-execute and powerful Query DSL and Search APIs.

The full Query DSL is based on JSON to define queries. A great description for the Query DSL structure is “think of the Query DSL as an AST of queries.” (DSL and AST stand for Domain Specific Language and Abstract Syntax Tree, respectively.)

One query structure compared to another can seem abstract. You may end up saying “Why not script all the search?” or “Why not match_all searches and use a top level filter?” at some point. However, once you’ve used Elasticsearch, you’ll quickly realize they’ve carefully tuned their queries to be intuitive and insightful about what your response should be.

curl -XGET 'localhost:9200/sports/athlete/_search?pretty' -d '{
  "query":{
    "match_all":{}
}'

In the request body above we’re searching with a query that will match_all documents in sports/athlete. Getting all of the documents is useful, but we want to match a specific field term in a document.

curl -XGET 'localhost:9200/sports/athlete/_search?pretty' -d '{ "query":{
   "match":{
     "name”: “michael"
   }  
 }
}'

 

Great! We get exactly two documents, but I don’t care about “Michael Whatshisname” because I want to match “Michael Lussier” (the great and powerful 🙂 ), so let’s match on “michael lussier”.

{
"name": "michael lussier"
}

What is this!?! I said match name “michael lussier” because I don’t want “Michael Whatshisname.” I only want “Michael Lussier.”

Here is what is happening. You’ll find that the match query defaults to a boolean “or” operation. To not get “michael” or “lussier,” we need to specify the “and” operator in our match query. To specify the term to search on, we place our field, in this case, string, inside the query object.

"query":{
  "match":{
    "name":{  
      "query": "michael lussier",
      "operator": "and"
    }
  }  
}

Now we have “Michael Lussier” who plays the sport “Baseball,” and we have “Michael Lussier” who plays the sport “Golf.” Both have an equal results _score because we requested documents with the name “Michael Lussier.” But in our case, we only want the “Baseball” player, and we don’t need a score at all. We just want “Michael Lussier” who plays the sport “Baseball.” Filters are exactly what we need in this situation since no scoring is needed — just simple exclusion/inclusion to find our document.

curl -XGET 'localhost:9200/sports/athlete/_search?pretty' -d '{
  "query":{
    "filtered" : {
      "filter" : {
        "bool" : {
          "must" : {
            "term" : { "sport" : "Baseball" }
          }
        }
      },
      "query" : {
        "match" : {
          "name":{
            "query" : "michael lussier",
            "operator" : "and"
          }
        }
      }
    }
  }
}'

Similar to adding the “match” > “name” > “query” object to specify what we want to query, we specify what query we want filtered. As implied earlier, you could simply filter your query.

query-filter.png#asset:515

But a top-level filter (shown above) which queries and then filters your results, is much slower than a filtered query that quickly filters your query (shown below).

filtered-query2.png#asset:449

As with Episode #1, I’ve included a simple AngularJS application in which to use these. Check out Episode 3 of our Qbox: Elasticsearch Tutorial series featuring unstructured search in Elasticsearch using analyzers.