Elasticsearch is document oriented, meaning that it stores entire objects or documents. Aside from storing them, it indexes the contents of each document in order to make them searchable. In Elasticsearch you can index, search, sort, and filter documents—not rows of column data. This is a fundamentally different way of thinking about data and it is one of the reasons Elasticsearch can perform complex full-text search.

The objects in the application are rarely simple lists of keys and values. More often objects are complex data structures that may contain dates, geo locations, other objects, or arrays of values.

Elasticsearch supports a number of different datatypes for the fields in a document. There are five core groups of datatypes.

Core datatypes

1. String datatype for text values

2. Numeric datatypes

  • Long
  • Integer
  • Short
  • Byte
  • Double
  • Float

3. Date datatypes in Elasticsearch can either be:

  • Strings containing formatted dates, e.g. 2015-01-01 A long number representing milliseconds-since-the-epoch.
  • 2015/01/01 12:10:30. An integer representing seconds-since-the-epoch.

4. Boolean datatype accepts JSON true and false values, but can also accept strings and numbers which are interpreted as either true or false:

  • False values: false, off, no, 0, "" (empty string), 0, 0.0
  • True values: anything that isn’t false.

5. Binary datatype accepts a binary value as a Base64 encoded string. The field is not stored by default and is not searchable.

Note: All examples in this article have been tested on the ES 1.7.4 version. There may be small differences in later versions.

The datatype for each field in a document (strings, numbers, objects, etc) can be controlled via type mapping. Let’s create an index and put mapping:

curl -XPUT 'http://localhost:9200/myindex/' -d '{  "mappings": {    "article": {      "properties": {
"author": {               "type":  "string"         },
"date_of_publication": {              "type": "date"         },         "likes": {             "type": "integer"         },        "rating": {             "type": "float"        },
        "is_published": {            "type": "boolean"        }
      }    }  }}'

Note: incorrect mappings, such as having a likes field mapped as type string instead of integer can produce confusing results to your queries.

A different way: Put the data in the index. Elastic tries to identify itself with dynamic mapping. Put data to another_index:

curl -XPUT 'http://localhost:9200/another_index/article/1' -d '
{
  "author": "One guy",
  "date_of_publication": "2015-12-21",
  "likes": 30,
  "rating": 4.2,
  "is_published": true
}'

And get mapping:

curl -XGET 'http://localhost:9200/another_index/_mapping/?pretty'
Response:
{
  "another_index" : {
 "mappings" : {
   "article" : {
     "properties" : {
       "author" : {
         "type" : "string"
       },
       "date_of_publication" : {
         "type" : "date",
         "format" : "dateOptionalTime"  
       },
       "is_published" : {
         "type" : "boolean"
       },
       "likes" : {
         "type" : "long"
       },
       "rating" : {
         "type" : "double"
       }
     }
   }
 }
  }
}

We see that the type of the field corresponds to the transmitted value, including the actual date. Dynamic mapping is a configurable parameter. For details, click here.

Complex datatypes

1. Array datatype

Array support does not require a dedicated type. For instance:

  • an array of strings: [ one, two ]
  • an array of integers: [ 1, 2 ]
  • an array of arrays: [ 1, [ 2, 3 ]] which is the equivalent of [ 1, 2, 3 ]
  • an array of objects: [ { "name": "Mary," "age": 12 }, { "name": "John," "age": 10 }

However, arrays of objects do not work as you would expect. You cannot query each object independently of the other objects in the array. For example, put some data in our index:

curl -XPUT 'http://localhost:9200/myindex/user/1' -d '
{
  "user": "Darth Vader",
  "followers": [        { "age": 21, "name": "Mary"},        { "age": 22, "name": "Alex"},        { "age": 23, "name": "Lisa"}    ]
}'
curl -XPUT 'http://localhost:9200/myindex/user/2' -d '
{
  "user": "Master Yoda",
  "followers": [        { "age": 24, "name": "Julia"},        { "age": 23, "name": "John"},        { "age": 26, "name": "Alex"}    ]
}'

And try to GET query with users who have the following data:  name Alex and age of 23 years.

curl -XGET 'http://localhost:9200/myindex/_search/?pretty' -d '
{
  "query": {
    "bool": {
      "must": [
     {
        "match": {
          "followers.age": 23
        }
      },
     {
        "match": {
          "followers.name": "Alex"
        }
     }
   ]
      }
  }
}'


We see that the elastic shows 2 matches. But how? We have no followers to users with the name Alex and 23 years of age.

Vta6QVPrFTVzt9VzVNRi1N8kGbOPVqdnHtgfvuDS

This happened because this document will be flattened as we described previously, but the result will look like this for the first user:

{    "followers.age":    [21, 22, 23],    "followers.name":   [mary, alex, lisa]} 


For the second user:

{    "followers.age":    [24, 23, 26],    "followers.name":   [julia, john, alex]}

The correlation between {age: 22} and {name: Alex} has been lost as each multivalue field is just a bag of values, not an ordered array. This is sufficient for us to ask: "Is there a follower who is 23 years old?" However, we cannot get an accurate answer to: "Is there a follower who is 26 years old and whose name is Alex?" That’s why elastic returns result with 2 hits. The follower's name is Alex and followers age array contains 23.

Correlated inner objects, which are able to answer queries like these, are called nested objects. If you need to be able to do this then you should use the nested datatype instead of the object datatype.

2. Nested datatype

The nested type is a specialized version of the object datatype that allows arrays of objects to be indexed and queried independently of each other.

Let's repeat the example from the previous paragraph, but this time we will make it work as we expect. First of all, we have to define mapping:

curl -XDELETE '<a href="http://localhost:9200/myindex/">http://localhost:9200/myindex/</a>'
curl -XPUT 'http://localhost:9200/myindex/' -d '{
  "mappings": {
      "user": {
   "properties": {
        "user": {
              "type": "string"
         },
        "followers": {
              "type": "nested"
         }
             }
         }
     }
}'

Pay attention to the followers type. We defined it as nested, then we place the documents such as in the example with array datatype, and make a query:

curl -XGET 'http://localhost:9200/myindex/_search/?pretty' -d '{
    "query": {
        "nested": {
          "path": "followers",
          "query": {
                "bool": {
                   "must": [
                      {
                         "match": {
                              "followers.age": 21
                          }
                       },
                      {
                           "match": {
                                "followers.name": "Alex"
                            }
                          }
                        ]
                    }
                }
            }
        }
}'

We don’t see hits because every user doesn’t have follower Alex, 21 years:

"hits": 
{ 
"total": 0, 
"max_score": null, 
"hits": [ ] 
}

One more query:

curl -XGET 'http://localhost:9200/myindex/_search/?pretty' -d '{
    "query": {
        "nested": {
          "path": "followers",
          "query": {
                "bool": {
                   "must": [
                      {
                         "match": {
                              "followers.age": 22
                          }
                       },
                      {
                           "match": {
                                "followers.name": "Alex"
                            }
                          }
                        ]
                    }
                }
            }
        }
}'

We can see one hit as a result because only one our user (Darth Vader) has follower Alex, 22 years. 

3. Object datatype

Object for single JSON objects. JSON documents are hierarchical in nature. The document may contain inner objects which in turn, may contain inner objects themselves. Since each document can have objects with different fields each time, when mapped this way they are known as "dynamic." Dynamic mapping is enabled by default. Let’s put some data and get mapping of our index:

curl -XDELETE '<a href="http://localhost:9200/myindex/">http://localhost:9200/myindex/</a>'
curl -XPUT 'http://localhost:9200/myindex/user/1' -d '
{
 "name" : {
     "first_name" : "Shay",
     "last_name" : "Banon"
 },
 "age": 25
}'
curl -XGET 'http://localhost:9200/myindex/_mapping/?pretty'


Response:

{
  "myindex" : {
    "mappings" : {
      "user" : {
    "properties" : {
      "age" : {
    "type" : "long"
      },
      "name" : {
    "properties" : {
      "first_name" : {
    "type" : "string"
      },
      "last_name" : {
    "type" : "string"
      }
    }
      }
    }
      }
    }
  }
}

We see that user and name have properties. It shows that this field has an object type. Also, if we set in the above example the name object mapped is not dynamic ("dynamic": "strict" in example below), meaning that if in the future we try to index JSON with a middlename within the name object, it will get discarded and not added:

curl -XDELETE 'http://localhost:9200/myindex/'
curl -XPUT 'http://localhost:9200/myindex/' -d '{
 "mappings": {
   "user": {
     "properties": {
       "age": {
         "type": "long"
       },
       "name": {
         "dynamic": "strict",
         "properties": {
           "first_name": {
             "type": "string"
           },
           "last_name": {
             "type": "string"
           }
         }
       }
     }
   }
 }
}'
curl -XPUT 'http://localhost:9200/myindex/user/1' -d '
{
 "name" : {
     "first_name" : "Shay",
     "last_name" : "Banon"
 },
 "age": 25
}'

Response is: {"acknowledged":true}. Record created.

curl -XPUT 'http://localhost:9200/myindex/user/2' -d '
{
 "name" : {
     "first_name" : "Shay",
     "last_name" : "Banon",
     "middle_name" : "Ruby"
 },
 "age": 25
}'

But in this case the response is:

{"error":"StrictDynamicMappingException[mapping set to strict, dynamic introduction of [middle_name] within [name] is not allowed]","status":400}

Geo dataypes

1. Geo-point datatype

Fields of type geo_point accept latitude-longitude pairs which can be used:

We can set geopoint mapping as:

  • properties - "location" : { "lat" : 41.12, "lon" : -71.34 }
  • string - "location" : "41.12,-71.34"
  • geohash - "location" : "drm3btev3e86"
  • array - "location" : [-71.34, 41.12]

For example, we can put cities mapping:

curl -XPUT 'http://localhost:9200/cities' -d 
{
    "mappings": {
        "city": {
            "properties": {
                "city": {"type": "string"},
                "location": {"type": "geo_point"}
            }
        }
    }
}'

And put information about London and the two nearby towns to our index:

curl -XPOST 'http://localhost:9200/cities/city/' -d '{
"city": "Radlett", "location": {"lat": 51.419, "lon": 0.197}
}'
curl -XPOST 'http://localhost:9200/cities/city/' -d '{
"city": "Chigwell", "location": {"lat": 51.371, "lon": 0.433}
}'
curl -XPOST 'http://localhost:9200/cities/city/' -d '{
"city": "London", "location": {"lat": 51.303, "lon": 0.732}
}'

Let’s look for the nearest towns to London:

curl -XGET 'http://localhost:9200/cities/city/_search?pretty=true' -d '
{
  "query": {
 "filtered" : {
     "query" : {
         "match_all" : {}
     },
     "filter" : {
         "geo_distance" : {
             "distance" : "30km",
             "location" : {
                 "lat": 51. 303,
                 "lon": 0.732
             }
         }
     }
 }
  }
}'

With distance 30km response there are 2 hits:


O-yFgiaeJPLNWTFfa32Jme0NyYq3ypNvcRoStR5o


If you increase the search distance up to 40 km, we’ll get all of our records starting with the  nearest:


hJz9JWxpND2-ZnLaVE9aKKx5zPf078_WiTngWVRZ


2. Geo-Shape datatype

The geo_shape datatype facilitates the indexing of and searching with arbitrary geo shapes such as rectangles and polygons. It should be used when either the data being indexed or the queries being executed contain shapes other than just points.

You can query documents with this type using geo_shape Query.

Specialized datatypes

1. IPv4 datatype

An ip field is really a long field which accepts IPv4 addresses and indexes them as long values. For example:

curl -XPUT 'http://localhost:9200/myindex/' -d '{  "mappings": {    "subscribers": {      "properties": {        "ip_addres": {          "type": "ip"        }      }    }  }}'curl -XPUT 'http://localhost:9200/myindex/subscribers/1' -d '{  "ip_addr": "192.168.1.1"}'
curl -XGET 'http://localhost:9200/myindex/_search' -d '{  "query": {    "range": {      "ip_addr": {        "gte": "192.168.1.0",        "lt":  "192.168.2.0"      }    }  }}'

2. Completion datatype

Completion provides auto-complete suggestions.

3. Token count datatype

A field of type token_count is really an integer field that accepts string values, analyzes them, and then indexes the number of tokens in the string.

Conclusion

Elasticsearch supports a number of different datatypes for the fields in a document. In this article, we have discussed Core, Complex, Geo, and Specialized datatypes with their corresponding examples.