It really helps Elasticsearch to index data, particularly if there are dates or timestamps involved. That is why Elasticsearch is very good tool for indexing logs. As you progress with your journey with Elasticsearch, Logstash, and Kibana, you will sometimes encounter the issue of having data that you have already indexed of which you want to change the mapping. This can be done, although you will have to reindex the data.

Most client API's actually have a reindex function and reindexing data is easier than you would think. Let's look at an example of reindexing our data after changing the mapping, while using the Python client API for Elasticsearch to do the reindexing for us.  

Changing mappings can be a big headache if it causes downtime. The question is: does it have to cause downtime? When you decide to make mapping changes, you will have to reindex your data. 

Several strategies exist for reindexing data and preventing potential down time:

  1. Using Logstash for reindexing data, I came across an article on this very interesting and creative idea recently while doing research. This works very good at scale. The idea would be to use Elasticsearch as input to logstash and Elasticsearch as output, while using a template to define the mapping for the output index.
  2. Most clients API’s are supposed to have reindex functionality.
  3. You can do this natively in Elasticsearch. Playing around with this in Elasticsearch directly is a good idea in order for you to understand better how it can be down and how it works conceptually. To change mapping you usually need to reindex data to a new index with a different name to the original index. If you are using Elasticsearch in an application, for example in the search engine of a web application, then you will need to change the name of the index that you are using for the search engine if you change its mapping and reindex it to a new index.
Old Index -> Change Mapping -> New index

Workflow for changing a mapping by using aliases. There seems to be lots of talk about using aliases to change mapping in order to reduce down time. This is how this workflow is supposed to work:

  1. Alias is created to point to index in use, which is the index that we plan to change the mapping of.
  2. We change the mapping of the old index by defining a new mapping and reindexing data from old index to a new index using the new mapping.
  3. Check to see if we are happy with mapping in new index created by reindexing old index with new index.
  4. Change alias to point from old index to new index.
  5. Remember to remove alias from pointing to the old index index because aliases can point to multiple indexes at once.

Important: Make use of aliases instead of index names in your configuration files in your applications. 

Important considerations when working with mappings: 

  1. An alias cannot be created without it pointing to something. 
  2. Although an alias can be created to something, it is then told to point to nothing for a brief period of time.
  3. Think of it as a symlink that is created on unix. 
  4. You cannot create a symlink to nothing. 

First, create an index with:

PUT /someindex

sense1.png#asset:1000

Then create an alias to the index with:

PUT /someindex/_alias/somealias

sense2.png#asset:1001

Here someindex is the name of the index we are pointing to . somealias is the name of the alias we are either using or just created. Remember that an alias can point to multiple indices.

Which alias points to a certain index?

sense3.png#asset:1002

GET /someindex/_alias/*

ReIndexing Data with a Client API

Make sure to install the client, or if you have it installed, keep your version up to date:

$ sudo pip install --upgrade elasticsearch

This is the code we are using to reindex. We aren’t doing anything else, and we are not setting a mapping before reindexing either. Have a look at the code and the comments for the code:

#!/usr/bin/env python
import elasticsearch
import elasticsearch.helpers
elasticSource = elasticsearch.Elasticsearch([{"host": "localhost", "port": 9200}])
elasticDestination = elasticsearch.Elasticsearch([{"host": "localhost", "port": 9200}])
# Setup source and destinations connection to Elasticsearch. Could have been different clusters
# Delete index so we know it doesn't exist.
elasticDestination.indices.delete(index="index_destination", ignore=[400, 404])
# Create index with nothing in it.
elasticDestination.indices.create(index="index_source", ignore=[400, 404])
elasticsearch.helpers.reindex(client=elasticSource, source_index="index_source", target_index="index_destination", target_client=elasticDestination)

You can run the script and will have to check if it worked. Put the code in a file reindex.py and run the script with:

$ python reindex.py

You can use the python client API to create an alias and to delete aliases. This is how we create an alias to our index “candidate_index”. In this example, we create an alias to our index “candidate_index” and we call the alias ”presidential_candidate”.

#!/usr/bin/env python
import elasticsearch
import elasticsearch.helpers
es = elasticsearch.Elasticsearch([{"host": "localhost", "port": 9200}])
# make sure that this alias doesn't conflict with any existing index name
alias = 'presidential_candidate'
# CAUTION: if you have an index already, you should create an alias for it first
#es_client.indices.put_alias(index='indexSource', name=alias)
es.indices.put_alias(index='candidate_index', name=alias)

We run the example with:

$ python alias_to_candidate_index.py

Now we can open up Sense and see if the alias has been created:

sense4.png#asset:1003

We looked at how to create an alias and how to reindex our data, but let’s look at how to set a mapping in code. The following example will demonstrate to you how to create an index and set a mapping for the index using the Python client API. We are using the index name “reverse-it-site”. Make sure that you delete the index if you already have an index by that name, before running this code. You can delete the index we created in one of our previous articles with:

DELETE /reverse-it-site

sense5-2.png#asset:1011

This is our code. Save it into a file and run it.

#!/usr/bin/env python
import elasticsearch
import elasticsearch.helpers
es = elasticsearch.Elasticsearch([{"host": "localhost", "port": 9200}])
mapping = '''
{
   "mappings": {
      "feed": {
         "properties": {
            "data": {
               "type": "nested",
               "include_in_parent": true
            }
         }
      }
   }
}'''
es.indices.create(index='reverse-it-site', ignore=400, body=mapping)
res = es.search(index='reverse-it-site', body={"query": {"match_all": {}}})
print res

After running this piece of code you will need to check if we actually set a mapping. You can also do this in Sense.

GET /reverse-it-site/_mappings

sense6-2.png#asset:1012

Regarding the reindex function that we previously considered, it is important to mention that it does not transfer the mappings, but only the data. Therefore, setting the mapping for a new index is essential before reindexing data to the new index. Just for the sake of being thorough, let’s look at how to set a mapping for a new index and reindexing using only Elasticsearch.

I deleted all the indices we've used so far and created an index reverse-it-site with a mapping using the Python client API. Now we need to create an index with the data in it. Imagine that this was the index for which we wanted to change the mapping. By reindexing it to the new index with the mapping already set, we will have an index with the data in it but using the correct mapping. In a real world scenario our data would already be in an index. For the purpose of this article, I've loaded our data into an index again with: 

$ curl -XPOST 'http://127.0.0.1:9200/reverse-it-site-old/feed?pretty=true' -d @clean.json

sense7-2_160627_152414.png#asset:1019

We are now going to reindex this data to the index with the correct mapping:

$ curl -XPOST 'localhost:9200/_reindex' -d '{
 "source" : {
 "index" : "reverse-it-site-old"
 },
 "dest" : {
 "index" : "reverse-it-site",
 "version_type": "external"
 }
}'

sense8-2.png#asset:1014

Now it is important to check if the newly created index has the correct mapping after we’ve added data to that index. We can do that by checking the mapping.

GET /reverse-it-site/_mappings

sense9_160627_152333.png#asset:1015

We need to verify that the index contains all the data that we need it to contain now that we know that the index does indeed have the correct mapping. We can just do a quick search over the data.

GET /reverse-it-site/_search

sense10_160627_152334.png#asset:1016

Conclusion

It is very important that you familiarize yourself with the concept of mappings and that you master it. 

Working with mappings in real life is not always an easy task, especially if you want to keep capturing data into Elasticsearch while you are busy changing the mapping. An interesting tool that you can look at is called "elasticsearch-reindex". You can read more about it on the github page: elasticsearch-reindex