After answering a question about autocomplete on StackOverflow, we thought it best to come over to the Qbox blog and write more extensively about the different ways of approaching autocomplete.

In this article, we include an example of how to get autocomplete up and running quickly in Elasticsearch with the Completion Suggest feature. We don't intend for this to be a complete treatment of the topic, but we do aim to give you enough information to get going as painlessly as possible.

For this post, we will be using hosted Elasticsearch on Qbox.io. You can sign up or launch your cluster here, or click "Get Started" in the header navigation. If you need help setting up, refer to "Provisioning a Qbox Elasticsearch Cluster."

Suppose that you have an index with documents that look like this:

{
    "title": "Product1",
    "description": "Product1 Description",
    "tags": [
        "blog",
        "magazine",
        "responsive",
        "two columns",
        "wordpress"
    ]
}
{
    "title": "Product2",
    "description": "Product2 Description",
    "tags": [
        "blog",
        "paypal",
        "responsive",
        "skrill",
        "wordland"
    ]
}

Suppose that you want to be able to retrieve search suggestions based on the tags field. The Completion Suggest feature is built precisely for this purpose. It's also built for extreme speed (at query time), which is especially important since autocomplete is a function that involves a rapid succession of many distinct requests.

To properly set this up, you need to define a field of type completion in your mapping. So, using the document structure above, we could define an index with the following mapping:

<pre">curl -XPUT "http://localhost:9200/test_index/" -d'</pre">
    {
   "mappings": {
      "product": {
         "properties": {
            "description": {
               "type": "string"
            },
            "tags": {
               "type": "string"
            },
            "title": {
               "type": "string"
            },
            "tag_suggest": {
               "type": "completion",
               "index_analyzer": "simple",
               "search_analyzer": "simple",
               "payloads": false
            }
         }
      }
   }
}'


Next, we add the documents:

</pre">
</p><pre><pre">curl -XPUT "http://localhost:9200/test_index/product/1" -d'</pre">
    {
   "title": "Product1",
   "description": "Product1 Description",
   "tags": [
      "blog",
      "magazine",
      "responsive",
      "two columns",
      "wordpress"
   ],
   "tag_suggest": {
      "input": [
         "blog",
         "magazine",
         "responsive",
         "two columns",
         "wordpress"
      ]
   }
}'
curl -XPUT "http://localhost:9200/test_index/product/2" -d'
{
    "title": "Product2",
    "description": "Product2 Description",
    "tags": [
        "blog",
        "paypal",
        "responsive",
        "skrill",
        "wordland"
    ],
   "tag_suggest": {
      "input": [
         "blog",
        "paypal",
        "responsive",
        "skrill",
        "wordland"
      ]
   }
}'

">


As you can see, the only difference is the addition of the values from the tags field into the input section of the tag_suggest field.

Now, we're ready to run a _suggest query against this index, asking for suggestions according to the text word:

</pre">
</p><pre><pre">curl -XPOST "http://localhost:9200/test_index/_suggest" -d'</pre">
    {
    "product_suggest":{
        "text":"word",
        "completion": {
            "field" : "tag_suggest"
        }
    }
}'
The results are as we would expect:
    <pre">{</pre">
    "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "product_suggest": [
      {
         "text": "word",
         "offset": 0,
         "length": 4,
         "options": [
            {
               "text": "wordland",
               "score": 1
            },
            {
               "text": "wordpress",
               "score": 1
            }
         ]
      }
   ]
}

We realize, of course, that you might have a concern with this approach—such as data duplication. Remember, this is a quick and dirty method for implementing autocomplete. We plan to describe other methods in future posts. Keep in touch by subscribing to our blog (in the panel to the right of this article).


Editor's note: Our metrics tell us that this article has been especially helpful since it was originally written back in January 2014, so we're republishing it after addin a few enhancements.


comments powered by Disqus