Following along from the previous article in this series, we are going to familiarize ourselves with more queries like the term query, multi-match query and the bool query. We will be seeing how each query works and how and when to apply them.

Sample documents: For demonstration purposes, we will use the documents indexed in the previous article.

Prefix Query

Examine the employee-id field of our documents. Presume that the first two letters of the employee-id represents the two letter state code of the office in which the employee is working. The next letter denotes the gender, M for male and F for female, of the employee. The rest represents the unique id number for the employee.

Therefore, NY M-2389 depicts that the employee is male, with a unique id of 2389, and is working in the company’s New York office, denoted by NY.

We want to know how many employees are from New York, and how many of them are male in our database. Prefix query can help us with such a scenario. Use the following prefix query and examine the results:

{
  "query": {
    "prefix": {
      "employee-id": "NY M"
    }
  }
}

In the result we see two documents that have matched our query. By examining the employee-id field we can be sure we have the right information. Note that the field to be applied with the prefix query should be mapped not_analyzed, otherwise it will be split into tokens and generate erroneous results.

Examine how the prefix query works by taking into consideration the inverted index of the documents (referring to only employee-id field):

Term            Document
NY M-2389         1
IL F-2213         2
NY M-3456         3

Our prefix query string was NY M . Therefore, the query would start looking from the first term and if it found a match, store the document id and if not will skip to the next term. After this cycle the query result would be left holding the document ids of the terms that have matched, in this case document 1 and document 3.

In this iteration, prefix query won’t be calculating the relevance score for the documents. Instead, it returns a score of 1 for all the documents. Also, as the number of words in the prefix query string decreases, it might take more cycles to compute the results. This is because it would be requiring it to check more documents, and hence more load to the cluster. It is not advisable to apply this query where there are large number of documents in the index. Techniques like making the query string larger and applying to moderate databases would help in better performance results.

Term query

Term query provides us the option with querying on the tokens generated in each field. The given query string is compared against the terms in the inverted index for a match. Here the term query string does not undergo any analysis. Look at a simple example:

{
  "query": {
    "term": {
      "name": "turner"
    }
  }
}

This term query will match two documents, documents 1 and 3, since they contain the term turner in their name fields. Refer to the inverted index mentioned in the match query section. What happens if the query string is capitalized? Run the query with the query string Turner instead of turner. Since the query string is not analyzed, there will not be any results. The inverted index is turner and the query string is Turner, therefore, no match occurs.

Multi-match query

What if we need to run the same query in multiple fields? In such cases we can use the multi-match query. Let us demonstrate it with a query:

{
  "query": {
    "multi-match": {
      "query": "porche",
      "fields": [
        "status",
        "favourite_car"
      ]
    }
  }
}

By running this query we can see that documents 1 and 2 were listed as the results. They contain the query strings in at least one of the fields mentioned in the query.

Bool query

Bool query is used in cases when needed to match multiple query clauses. Bool query accepts the following parameters:

  1. must – The query clauses in this parameter is required in the documents, otherwise they would be excluded from the search results.
  2. must_not – The documents matching the query clauses in this parameter would be excluded from the search results.
  3. should – If the documents matches the query clauses in this parameter, it will increase their score. In effect, any clauses matching in this parameter would help improve the score of the document.

The total score of each search result is calculated by combining the scores from each, must, and should query clause matches. In our example, if we apply certain conditions:

  1. Include all documents with the term feeling in the field status
  2. Exclude all the documents with audi in the field favourite_car
  3. Make more relevant if the field dessert contains the term milkshakes

The above clauses can be translated to the following bool query:

{
  "bool": {
    "must": {
      "match": {
        "status": "feeling"
      }
    },
    "must_not": {
      "match": {
        "favourite_car": "audi"
      }
    },
    "should": [
      {
        "match": {
          "dessert": "milkshakes"
        }
      }
    ]
  }
}

There are only two documents matching the above criteria, which is documents  1 and 2. Amongst them, since the should query matched for document 2, it will have the highest relevance_score assigned.

Conclusion

In this post, we have seen the application of the prefix, term, multi-match, and bool queries in detail. In the next, we will see an advanced case of the bool query. We will name each query so that we can understand which query matched and which didn’t.