Many facets have a directly equivalent aggregation, and migration is as straightforward as replacing the keyword “facets” with “aggregations” or “aggs” in your query. For facets that do not have an equivalent aggregation, the Elasticsearch reference provides us with basic examples for migrating these facets to their aggregation counterparts. Two of these, query facets and facet filters, will be referenced in this post.

Editor note: This is a guest post by Brad Simmons, a friend of the company with deep experience in building faceted navigation in the e-commerce context.

The aggregations module was introduced in the 1.0.0 version of Elasticsearch and serves to break the “barriers the current facet implementation put in place.”

Aggregations were born of the limitations users found within the facets module and provide support to build complex aggregate queries that eclipse the limitations within the facet API. Indeed, aggregations are meant to replace facets, and with the release of Elasticsearch 1.3 in June, facets were marked as deprecated and users encouraged to migrate to the aggregation framework. If you are new to aggregations, take a look at “An Introduction to Elasticsearch Aggregations” by Michael Lussier on this blog, The Definitive Guide and the official docs.

Many facets have a directly equivalent aggregation, and migration is as straightforward as replacing the keyword “facets” with “aggregations” or “aggs” in your query. For facets that do not have an equivalent aggregation, the Elasticsearch reference provides us with basic examples for migrating these facets to their aggregation counterparts. Two of these, query facets and facet filters, will be referenced in this post.

Recently, the Elasticsearch team has begun an effort to provide community-sourced demos of the ELK stack available to the public. The first published demo shares insights gleaned from analyzing traffic data provided by the City of New York. This public data set provides a plethora of information regarding traffic related incidents and is a perfect foundation for exploring the ELK stack. I’ll be referencing queries from that demo in this article. Setting up the project locally isn’t necessary in order to follow along, but I encourage you to check it out.

Although facets are deprecated, the current version of the Kibana dashboard supported by Elasticsearch v1.3 generates facets internally to gather data for queries or filters entered by the user. These generated queries provide us with some great example material for migrating a faceted query to an aggregation.

Let’s break down the “Top Streets with Most Accidents” query:

{
 "facets": {
   "terms": {
     "terms": {
       "field": "on_street_name",
       "size": 13,
       "order": "count",
       "exclude": []
     },
     "facet_filter": {
       "fquery": {
         "query": {
           "filtered": {
             "query": {
               "bool": {
                 "should": [
                   {
                     "query_string": {
                       "query": "-unspecified AND -\"%{contributing_factor_vehicle_1}\""
                     }
                   }
                 ]
               }
             },
             "filter": {
               "bool": {
                 "must": [
                   {
                     "range": {
                       "@timestamp": {
                         "from": 1341471410960,
                         "to": 1399908069679
                       }
                     }
                   }
                 ]
               }
             }
           }
         }
       }
     }
   }
 },
 "size": 0
}

Migrating the terms facet at the top level simply requires replacing “facets” with “aggs,” and, by default, buckets are ordered by their doc_count in descending order, so we can remove the “order” parameter.

{
 "aggs": {
   "foo": {
     "terms": {
       "field": "on_street_name",
       "size": 13
     }
   }
 }
}

As shown in the docs, facet filters can be replaced with a filter aggregation. In our example query, the facet filter contains multiple individual filters that can be combined in the aggregation taking its place. In the filter section of the superseding aggregation, we can put the query and range filters under a single bool filter. Both clauses need be satisfied so we place the filters within a “must” section. For more information, the definitive guide has an informative section on combining filters.

Thus our fully migrated query follows:

{
 "aggs": {
   "foo": {
     "filter": {
       "bool": {
         "must": [
           { "query": { "query_string": { "query": "-unspecified AND -\"%{contributing_factor_vehicle_1}\"" }}},
           { "range": { "@timestamp": { "from": 1341471410960, "to": 1399908069679 }}}
         ]
       } 
     },
     "aggs": {
       "bar": {
         "terms": {
           "field": "on_street_name",
           "size": 13
         }
       }
     }
   }
 }
}

This migration, while necessary moving forward, is purely cosmetic; the facets in the example serve their purpose without much complexity, and the change is simple.

In order to demonstrate the ease with which aggregations can generate complex queries, we need another example. Through its query builder, Kibana allows the user to create complex facet queries that would be, at best, tedious to write manually. For instance, the next query is over 1,100 lines long. Queries like these are great examples of how the power of aggregations can make writing complex queries much simpler and more efficient.

Let’s break down the “All Accident Types and their Distribution” query, abridged for brevity:

{
 "facets": {
   "77": {
     "query": {
       "filtered": {
         "query": {
           "query_string": {
             "query": "contributing_factor_vehicle:\"Driver Inattention/Distraction\" AND (-unspecified AND -\"%{contributing_factor_vehicle_1}\" AND -\"Other Vehicular\")"
           }
         },
         "filter": {
           "bool": {
             "must": [
               {
                 "range": {
                   "@timestamp": {
                     "from": 1341471410960,
                     "to": 1399908069679
                   }
                 }
               }
             ]
           }
         }
       }
     }
   },
   "78": {
     "query": {
       "filtered": {
         "query": {
           "query_string": {
             "query": "contributing_factor_vehicle:\"Failure to Yield Right-of-Way\" AND (-unspecified AND -\"%{contributing_factor_vehicle_1}\" AND -\"Other Vehicular\")"
...

This query consists of an individual facet for the over 40 unique terms within the “contributing_factor_vehicle” field. Each facet contains a query filter for a unique term and the same range filter we saw in the previous example. The limitations of facets are painfully obvious in this query, and migrating to aggregations will greatly reduce code repetition, improve readability, and decrease computing cost.

To begin migrating this to aggregations, let’s start from scratch and simply return the buckets for the field we want to operate on.

{
 "aggs": {
   "foo": {
     "terms": {
       "field": "contributing_factor_vehicle",
       "size": 0
     }
   }
 }
}

By default, the terms aggregation returns “the buckets for the top ten terms ordered by the doc_count.” In order to provide the same data as the original query, we need the buckets for every term. This behavior can be changed by modifying the size parameter and, as of Elasticsearch v1.1.0, it “is possible to not limit the number of terms that are returned by setting size to 0.” This will give us all the buckets we need, but invoking such behavior should be done carefully. The docs warn that because the terms are sorted, it should not be used on “high-cardinality fields” as it would be a great boon on your CPU and network.

Without the filters, our doc_counts are way off from the original query. Let’s change the top-level aggregation to a filter aggregation and add our range and query filters.

{
"aggs": {
  "foo": {
    "filter": {
       "bool": {
         "must": [
           { "query": { "query_string": { "query": "-unspecified AND -\"%{contributing_factor_vehicle_1}\" AND -\"Other Vehicular\"" }}},
           { "range": { "@timestamp": { "from": 1341471410960, "to": 1399908069679 }}}
         ]
       }
     },
    "aggs": {
      "bar": {
        "terms": {
          "field": "contributing_factor_vehicle",
          "size": 0
        }
      }
    }
  }
}
}

The old query repeated a query filter for each unique term in the contributing_factor_vehicle field. These terms are now available to us as bucket keys, and we can take the remaining logic from the initial query_string and apply it within its own filter. Thus, our migrated query returns identical results as the original in a more efficient and much easier-to-read manner.

As you can see, the aggregations API has the potential to greatly reduce the complexity and physical size of your query logic. Regardless of the benefit, migrating from facets is a necessity if you plan to remain on new versions of Elasticsearch; as the docs state, facets “will be removed in a future release.”

Aggregations represent a fundamental shift in how we view the modern database. Elasticsearch’s facet API was a great first step in enhancing search experience, but its successor is a far more powerful tool for building data-rich applications.