A nested type is a specialized version of the object datatype that allows arrays of objects to be indexed and queried independently of each other. If you need to index arrays of objects and to maintain the independence of each object in the array, you should use the nested datatype instead of the object datatype. Internally, nested objects index each object in the array as a separate hidden document, meaning that each nested object can be queried independently of the others, with the nested query.

For this post, we will be using hosted Elasticsearch on Qbox.io. You can sign up or launch your cluster here, or click "Get Started" in the header navigation. If you need help setting up, refer to "Provisioning a Qbox Elasticsearch Cluster."

In the same way as we need to use the special nested query to gain access to nested objects at search time, the dedicated nested aggregation allows us to aggregate fields in nested objects:

Here, the nested aggregation “steps down” into the nested comments object. Comments are bucketed into months based on the comments.date field and the average number of rating is calculated for each bucket.

curl -XGET 'ES_HOST:ES_PORT/blogs/series/_search -d '{
  "size" : 0,
  "aggs": {
    "post_comments": {
      "nested": {
        "path": "comments"
      },
      "aggs": {
        "comments_by_month": {
          "date_histogram": {
            "field":    "comments.date",
            "interval": "month",
            "format":   "yyyy-MM"
          },
          "aggs": {
            "avg_rating": {
              "avg": {
                "field": "comments.rating"
              }
            }
          }
        }
      }
    }
  }
}'

The results show that aggregation has happened at the nested document level. There are a total of twenty comments: eight in May and twelve in June.

...
"aggregations": {
  "post_comments": {
     "doc_count": 20,
     "comments_by_month": {
        "buckets": [
           {
              "key_as_string": "2014-05",
              "key": 434234232343,
              "doc_count": 8,
              "avg_rating": {
                 "value": 3
              }
           },
           {
              "key_as_string": "2014-06",
              "key": 432312439854,
              "doc_count": 12,
              "avg_rating": {
                 "value": 4 
             }
           }
        ]
     }
  }
}
...

Reverse_Nested Aggregation

A nested aggregation can access only the fields within the nested document. It can't see fields in the root document or in a different nested document. However, we can step out of the nested scope back into the parent with a reverse_nested aggregation.

For instance, we can find out which tags our commenters are interested in, based on the age of the commenter. The comments.age is a nested field, while the tags are in the root document.

Learn about our Kubernetes as a Service Product

Here, The nested agg steps down into the comments object. The histogram agg groups on the comments.age field, in buckets of 10 years. Also, The reverse_nested agg steps back up to the root document. The terms agg counts popular tags per age group of the commenter.

curl -XGET 'ES_HOST:ES_PORT/blogs/series/_search -d -d '{
  "size" : 0,
  "aggs": {
    "blog_comments": {
      "nested": {
        "path": "comments"
      },
      "aggs": {
        "age_group": {
          "histogram": {
            "field":    "comments.age",
            "interval": 10
          },
          "aggs": {
            "blog_posts": {
              "reverse_nested": {},
              "aggs": {
                "tags": {
                  "terms": {
                    "field": "tags"
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}'

The abbreviated results show us the following. There are twenty comments. There are twelve comments by commenters between the ages of 30 and 40. Ten blog posts are associated with those comments. The popular tags in those blog posts are kubernetes, cloud, and container.

..
"aggregations": {
  "comments": {
     "doc_count": 20,
     "age_group": {
        "buckets": [
           {
              "key": 30,
              "doc_count":12,
              "blog_posts": {
                 "doc_count": 10,
                 "tags": {
                    "doc_count_error_upper_bound": 0,
                    "buckets": [
                       { "key": "kubernetes",   "doc_count": 4 },
                       { "key": "cloud",     "doc_count": 6 },
                       { "key": "container", "doc_count": 2 }
                    ]
                 }
              }
           },
…

Performance Consideration

Nested objects are useful when there is one main entity, like our blogpost, with a limited number of closely related but less important entities, such as comments. It is useful to be able to find blog posts based on the content of the comments, and the nested query and filter provide for fast query-time joins.

The disadvantages of the nested model are as follows:

  • To add, change, or delete a nested document, the whole document must be reindexed. This becomes more costly the more nested documents there are.

  • Search requests return the whole document, not just the matching nested documents. Although there are plans afoot to support returning the best -matching nested documents with the root document, this is not yet supported.

Limiting the Number of Nested Fields

Indexing a document with 100 nested fields actually indexes 101 documents as each nested document is indexed as a separate document. To safeguard against ill-defined mappings the number of nested fields that can be defined per index has been limited to 50.

The following setting allow you to limit the number of nested fields that can be created manually or dynamically, in order to prevent bad documents from causing a mapping explosion.

index.mapping.nested_fields.limit - The maximum number of nested fields in an index, defaults to 50.

Give it a Whirl!

It's easy to spin up a standard hosted Elasticsearch cluster on any of our 47 Rackspace, Softlayer, or Amazon data centers. And you can now provision your own AWS Credits on Qbox Private Hosted Elasticsearch

Questions? Drop us a note, and we'll get you a prompt response.

Not yet enjoying the benefits of a hosted ELK-stack enterprise search on Qbox? We invite you to create an account today and discover how easy it is to manage and scale your Elasticsearch environment in our cloud hosting service.

comments powered by Disqus