In this tutorial we explain how to set analyzers and datatypes by default to mappings. First, you need to understand what mapping is. Mapping is the process of defining how a document and its fields are stored and indexed. To be able to treat date fields as dates, numeric fields as numbers, and string fields as full-text or exact value strings, Elasticsearch needs to know what type of data each field contains. This information is included in the mapping.

For example, we can use mappings to define:

  • Which string fields should be treated as full text fields.
  • Which fields contain numbers, dates, or geolocations.
  • Whether the values of all fields in the document should be indexed into the catch-all _all field.
  • The format of date values.
  • Custom rules to control the mapping for dynamically added fields.

Mapping Types

There are several mapping types, and each index has one or more of them, which are used to divide the documents in an index into logical groups.

This means that user documents might be stored in a “user” type, and blog posts in a “blogpost” type, like in this picture:

elasticsearch mapping

Each mapping type has:

  • Meta-fields.
  • Fields or properties.

Tutorial: How to Install Supergiant Container Orchestration Engine on AWS EC2

Meta-fields are used to customize how a document’s metadata associated is treated. Examples of meta-fields include the document’s _index_type, _id, and _source fields.

What about fields or properties? Each mapping type contains a list of fields or properties pertinent to that type. For example, user type might contain “title”, “name”, and “age” fields, while a blogpost type might contain “title”, “body”, “user_id” and created fields.

Fields with the same name in different mapping types in the same index must have the same mapping.

Field Datatypes

Each field has a datatype which can be:

  • A simple type like string, date, long, double, boolean or ip.
  • A type which supports the hierarchical nature of JSON such as object or nested.
  • A specialized type, like geo_point, geo_shape, or completion.

It is often useful to index the same field in different ways for different purposes. For example, a “string” field could be indexed as an “analyzed” field for the full-text search, and as a “not_analyzed” field for sorting or aggregations. Alternatively, you could index a string field with the standard analyzer, the English analyzer, and the French analyzer.

This is the purpose of multi-fields. Most datatypes support multi-fields via the fields parameter.

Dynamic Mapping

Fields and mapping types don’t need to be defined before using. Thanks to dynamic mapping, new mapping types and new field names will be added automatically just by indexing a document. New fields can be added both to the top-level mapping type and to hidden object and nested fields. In other words, we can use dynamic mapping rules for customizing mapping that is used for new types and new fields.

Explicit Mappings

Ironically, while dynamic mapping can be useful to get started at some point you will want to specify your explicit mappings. You can create mapping types and field mappings when you create an index, and you can add mapping types and fields to an existing index with the PUT mapping API.

Updating Existing Mappings

As for the update, existing type and field mappings cannot be updated, because changing the mapping would mean invalidating already indexed documents. To avoid this situation, you should create a new index with the correct mappings and reindex your data into that index.

Fields Shared Across Mapping Types

Mapping types are used to group fields, but the fields in each mapping type are not independent of each other. You can group fields with:

  • The same name.
  • The same index.
  • Different mapping types.
  • Map to the same field internally.
  • Must have the same mapping.

If a title field exists in both the “user” and “blogpost” mapping types, the “title” fields must have exactly the same mapping in each type. The only exceptions to this rule are the “copy_to”, “dynamic”, “enabled”, “ignore_above”, “include_in_all”, and properties parameters, which may have different settings per field.

Blog Post: Top Reasons Businesses Should Move to Kubernetes Now

Usually, fields with the same name also contain the same type of data, so having the same mapping is not a problem. When conflicts do arise, these can be solved by choosing more descriptive names, such as “user_title” and “blog_title”.

You’ve read a significant amount of theory. Now it is time to practice. Let’s try to create new mapping:

curl -XPUT 'localhost:9200/test?pretty' -d '
{
  "mappings": {
    "user": {   
      "_all":       { "enabled": false  }, 
      "properties": { 
        "title":    { "type": "string"  }, 
        "name":     { "type": "string"  }, 
        "age":      { "type": "integer" }  
      }
    },
    "blogpost": { 
      "properties": { 
        "title":    { "type": "string"  }, 
        "body":     { "type": "string"  }, 
        "user_id":  {
          "type":   "string", 
          "index":  "not_analyzed"
        },
        "created":  {
          "type":   "date", 
          "format": "strict_date_optional_time||epoch_millis"
        }
      }
    }
  }
}'

Comments:

  1. Create mappings
  2. Add mapping types called “user” and “blogpost”.
  3. Disable the _all meta field for the user mapping type.
  4. Specify fields or properties in each mapping type.
  5. Specify the data type and mapping for each field.

 

We get the response:

elasticsearch mapping true response

The result of the last command tells us that new index with our mapping was successfully created.

Set Analyzers & Datatypes by Default to Mappings

Use “dynamic_templates”

As an option to set default analyzers and data_types for mapping, you can use “dynamic_templates”.

With dynamic_templates, you can take complete control over the mapping that is generated for newly detected fields. You can even apply a different mapping depending on the field name or datatype.

Each template has a name which you can use to describe what the template does, a mapping to specify the mapping that should be applied, and at least one parameter (such as the match) to define which fields the template should apply to.

Templates are checked in order, so the first template that matches is applied. For example, we could specify two templates for string fields:

  • es: Field names ending in “_es” should use the Spanish analyzer.
  • en: All others should use the English analyzer.

We put the es template first, because it is more specific than the catchall en template, which matches all string fields:

curl -XPUT 'localhost:9200/my_index?pretty' -d '
{
    "mappings": {
        "my_type": {
            "dynamic_templates": [
                { "es": {
                      "match":              "*_es", 
                      "match_mapping_type": "string",
                      "mapping": {
                          "type":           "string",
                          "analyzer":       "spanish"
                      }
                }},
                { "en": {
                      "match":              "*", 
                      "match_mapping_type": "string",
                      "mapping": {
                          "type":           "string",
                          "analyzer":       "english"
                      }
                }}
            ]
}}}'

Comments:

  1. Match string fields whose name ends in _es.
  2. Match all other string fields.

The “match_mapping_type” allows you to apply the template only to fields of the specified type, as detected by the standard dynamic mapping rules. Examples include “string” or “long”.

The “match” parameter matches just the field name, and the “path_match” parameter matches the full path to a field in an object, so the pattern address “*” would match a field like this:

{
    "address": {
        "city": {
            "name": "New York"
        }
    }
}

The “unmatch” and “path_unmatch” patterns can be used to exclude fields that would otherwise match.

Consequently, using dynamic_templates we can set the default analyzers and datatypes for different mappings, and you can create a condition to set default Mappings for different types.

_Default_ Mapping

As another variant, we can use the “_default_” option to set default filters or analyzers. The default mapping, which will be utilized as the base mapping for any new mapping types, can be customized by adding a mapping type with the name _default_ to an index, either when creating the index or later on with the PUT mapping API.

For example:

curl -XPUT 'localhost:9200/my_index2?pretty' -d '
{
  "mappings": {
    "_default_": { 
      "_all": {
        "enabled": false
      }
    },
    "user": {}, 
    "blogpost": { 
      "_all": {
        "enabled": true
      }
    }
  }
}'

Comments:

  1. The “_default_” mapping defaults the “_all” field to disabled.
  2. The “user” type inherits the settings from “_default_”.
  3. The “blogpost” type overrides the defaults and enables the “_all” field.

While the “_default_” mapping can be updated after an index has been created, the new defaults will only affect mapping types that are created afterward.

The “_default_” mapping can be used in conjunction with Index templates to control dynamically created types within automatically created indices.

Let’s try this method:

curl -XPUT 'localhost:9200/_template/logging?pretty' -d '
{
  "template":   "logs-*", 
  "settings": { "number_of_shards": 1 }, 
  "mappings": {
    "_default_": {
      "_all": { 
        "enabled": false
      },
      "dynamic_templates": [
        {
          "strings": { 
            "match_mapping_type": "string",
            "mapping": {
              "type": "string",
              "fields": {
                "raw": {
                  "type":  "string",
                  "index": "not_analyzed",
                  "ignore_above": 256
                }
              }
            }
          }
        }
      ]
    }
  }
}'

Comments:

  1. The “logging” template will match any indices beginning with “logs-”.
  2. Matching indices will be created with a single primary shard.
  3. The “_all” field will be disabled by default for new type mappings.
  4. String fields will be created with an “analyzed” main field, and a “not_analyzed” “.raw” field.

Using the previous command we can create a new template,which will be used to all new suitable data. Try to PUT some new data:

curl -XPUT 'localhost:9200/logs-2016.08.20/event/1?pretty' -d '{ "message": "error:16" }'

The response is:

elasticsearch default mapping

Send GET Request:

curl -XGET 'localhost:9200/logs-2016.08.20/event/1?pretty'

And we get a response like this:

mapping4.png#asset:1087

So, as you can see, for this new data applies a template that we set previously. And this method also allows setting default analyzers or datatypes for new mappings, which satisfy previously set conditions.

Conclusion

In this post we explained how to set analyzers and datatypes by default to mappings. We also explained different types of datatypes and mappings, as well as a practical tutorial. Questions/Comments? Drop us a line below.