In this blog post, we show how the Suggest API in Elasticsearch can handle misspelled words using the terms suggester. We also explore various implementations of the term suggester API in Elasticsearch > 6.0

Term Suggestion for Spell Checking

Users often make spelling mistakes in their search queries. A viable search solution must be able to provide the suggestion even though there are spelling mistakes in the query. The Suggest API provides two such spell check APIs, -- term suggest and phrase suggest. The term suggest API is designed to correct the spelling of the mistyped word. In its turn, the phrase suggest API is an advanced version of the term suggest API that accounts for multiple terms. Let us go over the term suggest API and its functionality in more detail.

Test Documents

To illustrate how the terms suggester works, we have to index some sample documents.  Below are five documents for indexin

curl -XPOST 'localhost:9200/term-suggest/test/1' -d '{"name": "bald"}' -H "Content-Type: application/json"
curl -XPOST 'localhost:9200/term-suggest/test/2' -d '{"name": "bold"}' -H "Content-Type: application/json"
curl -XPOST 'localhost:9200/term-suggest/test/3' -d '{"name": "blend"}' -H "Content-Type: application/json"
curl -XPOST 'localhost:9200/term-suggest/test/4' -d '{"name": "bend"}' -H "Content-Type: application/json"
curl -XPOST 'localhost:9200/term-suggest/test/5' -d '{"name": "blood"}' -H "Content-Type: application/json"

Case 1 - Simple Use Case

Now that the index has the necessary documents in it, let's pass the suggest query to demonstrate how the misspelled words are queried, and how the corrections are shown. Pass the below suggest query to the index:

curl -XPOST 'localhost:9200/term-suggest/_search?pretty' 
-H "Content-Type: application/json"  -d '{
 "suggest": 
   { "my-suggestion": { "text": "bleed", 
      "term": { "field":"name"}
  }
 }
}'

Note: _suggest endpoint was deprecated in favour of making use of suggest via _searchendpoint. In 5.0 Elasticsearch version, the _search endpoint has been optimized for suggest-only search requests. 

The above query is a basic example of the term suggest API. In the above query, we can see that the "suggest" key accepts a query object with an arbitrary identifier named "my-suggestion" . In its turn, the "text" field expects a search keyword (note that we use a misspelled word "bleed" that does not exist in your index). We are checking the spelling suggestions against the field "name" of the documents indexed, which is mentioned under the "term" object. The results of the above query are given below:

{
  "took" : 93,
  "timed_out" : false,
  "_shards" : {
    "total" : 4,
    "successful" : 4,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "suggest" : {
    "my-suggestion" : [
      {
        "text" : "bleed",
        "offset" : 0,
        "length" : 5,
        "options" : [
          {
            "text" : "blend",
            "score" : 0.8,
            "freq" : 1
          },
          {
            "text" : "blood",
            "score" : 0.6,
            "freq" : 1
          },
          {
            "text" : "bend",
            "score" : 0.5,
            "freq" : 1
          }
        ]
      }
    ]
  }
}

In the above response, we can see that spelling suggestions are listed under the array "options" inside the "my-suggestion" object. Here the closest match for the given search term ("bleed") is found to be "blend" scoring 0.8 from the total 1 (full match). Other terms in the array have lower scores, which indicates they are not the closest matches.

Case 2 - Multi-Term Suggest Request

Another useful feature is the multi-term suggest request. Here we can pack several term suggest requests in the same suggest query. See how it works in the example below:

curl -XPOST 'localhost:9200/term-suggest/_search?pretty' 
 -H "Content-Type: application/json"  -d '{
 "suggest": 
     { "my-suggestion":
          { "text": "bleed", "term": { "field":"name"}}, 
       "my-suggestion-2": 
         { "text": "blod", "term": { "field":"name"}}
 }
}'

In the above example, we made use of the spell checks against only one field ("name"). We can do the same for multiple fields too by simply replacing the "field" value with another one.

The above request returned the following response:

{
  "took" : 232,
  "timed_out" : false,
  "_shards" : {
    "total" : 4,
    "successful" : 4,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "suggest" : {
    "my-suggestion" : [
      {
        "text" : "bleed",
        "offset" : 0,
        "length" : 5,
        "options" : [
          {
            "text" : "blend",
            "score" : 0.8,
            "freq" : 1
          }
          ...............
        ]
      }
    ],
    "my-suggestion-2" : [
      {
        "text" : "blod",
        "offset" : 0,
        "length" : 4,
        "options" : [
          {
            "text" : "blood",
            "score" : 0.75,
            "freq" : 1
          },
          {
            "text" : "bold",
            "score" : 0.75,
            "freq" : 1
          },
          ................
        ]
      }
    ]
  }
}

As you see, the query results are now stacked under two separate arrays, the "my-suggestion" and "my-suggestion-2".  

Case 3 - Global Suggest Text

To avoid repetition of the suggest text, we can define a global text. In the example below, the suggest text is defined globally and it applies to both suggest queries we've specified. This functionality is useful when you send a suggest query against two different fields inside your index. Assuming we had another field "substance" in our index, we could create the following suggest query using a global text. 

curl -XPOST 'localhost:9200/term-suggest/_search?pretty' 
 -H "Content-Type: application/json"  -d '{
   "suggest": 
      { "text": "blod", 
       "my-suggestion": 
          { "term": { "field":"name"}}, 
       "my-suggestion-2": 
          { "term": { "field":"substance"}}
 }
}'

In the above request, we can see that the global suggest text "blod" is defined for the suggest query and two sub-queries use this text against two fields in the index. This way we can define a global text once and automatically reuse it for many fields.

Conclusion

In this tutuorial, we explored the suggest API feature: term suggestion. We also went over various use cases  involving how to pass multiple suggestions and use global text for searching against multiple fields. 

Other Helpful Tutorials

Give It a Whirl!

It's easy to spin up a standard hosted Elasticsearch cluster on any of our 47 Rackspace, Softlayer, or Amazon data centers. And you can now provision your own AWS Credits on Qbox Private Hosted Elasticsearch

Questions? Drop us a note, and we'll get you a prompt response.

Not yet enjoying the benefits of a hosted ELK-stack enterprise search on Qbox? We invite you to create an account today and discover how easy it is to manage and scale your Elasticsearch environment in our cloud hosting service.