The ever growing competition in the field of e-commerce analytics is proof of the increasing importance of business intelligence, and the increasing popularity of <strong”>Elasticsearch isn’t a coincidence. But did you know that <strong”>Elasticsearch can also help you manage your business intelligence requirements?</strong”></strong”>

Editor note: We asked Neil Alex to write about his experience using Elasticsearch in the context of typical e-commerce solutions. Neil is a freelance data consultant with expertise in Lucene and Elasticsearch. This is the second post in this series. — Mark Brandon.

In the first post in this series, we explain how to create a guided search with Elasticsearch. In this article, we’ll explore Elasticsearch from an analytic point of view. Creating an easy and well-guided search strategy is important, but that only takes you part of the way. Like most businesses, you want an easy approach to monitor and analyze the performance of your e-commerce systems-including metrics such as total sales and top products. In the previous post, we built a simple e-commerce platform on Elasticsearch, but we want to go further and gather basic business intelligence.

In Part 1, we spent some time looking at faceting. For the prupose of semantics, we realize that facets have been deprecated in Elasticsearch in favor of aggregations, but since the whole world still refers to the practice, we continue to use that term. In practice, we encourage using aggregations.

Beginning with version 1.3, you can configure Elasticsearch as an analytics platform using aggregations, which allows you to generate elaborate analytics that covers all of your data. The aggregations feature is similar to a GROUP BY clause in SQL, but much more powerful. There are a number of ready-made aggregation features: all you need to do is choose the ones that match your needs and then combine them properly. Continue reading below to see how it’s done.

Although facets provide a great way to aggregate data within the context of a single document set, this context is defined-and limited-by the query and various levels of filters (filtered queries, top-level filters, and facet-level filters). Though facets are sufficient in many simple scenarios, they don’t support complex aggregations. In Elasticsearch, the aggregations feature became a replacement of facets as a response to extensive user experience in real-time data analytics. Elasticsearch aggregations is the next generation of faceting, because it breaks through all of the facet limitations.

An aggregation is a unit-of-work that builds analytic information. The context of the execution defines the document set. For a facet, the context is the document. But the context for a top-level aggregation is the query and filters of the entire search request.

There are many types of aggregations, each with a specific purpose and type of output. To better understand these types, we can break them into two main categories:

  • Bucketing – these aggregations build buckets, in which each bucket corresponds to a key and a document criterion. When the aggregation executes, it evaluates all of the bucket criteria for each document in the context. When there is a match, the document is “falls into” one of the buckets. At the completion of the aggregation process, you’ll have a list of buckets-each one having a corresponding set of documents.
  • Metric – these aggregations compute metrics across a set of documents.

Since each bucket effectively defines a document set (all documents belonging to the bucket), you can potentially associate aggregations at the bucket level, and each of those will execute within the context of that bucket. This is where we see the full power of aggregations: nested aggregations.

Let’s consider some analytic use cases for our e-commerce setup. In our example store, we offer many products from different brands. Now, we want to focus on the performance metrics of each product with respect to its brand.

In the examples given in Part 1, we indexed the piecesSold field-which carries the total sales information for each distinct product. Now, we’ll use a sum aggregation and nest it inside a terms aggregation on a brand field, and the results will contain the total-sale statistics that we seek. The sum aggregation is a type of metric aggregation that can help you add up the values in a specific field. Here’s the code:

POST /ecommerce/gadgets/_search
{
  "size": 0,
  "aggregations": {
     "brands_agg": {
        "terms": {
           "field": "brand"
        },
   "aggregations": {
       "total_prodsales_agg": {
          "sum": {
             "field": "piecesSold"
          }
       }
   }   
}

The results of this aggregation would be:

{
 "took": 14,
  "timed_out": false,
  "_shards": {
     "total": 1,
     "successful": 1,
     "failed": 0
  },
  "hits": {
     "total": 9,
     "max_score": 0,
     "hits": []
  },
  "aggregations": {
     "brands_agg": {
     "buckets":[        
           {
              "key": "apple",
              "doc_count": 4,
              "total_prodsales_agg": {
                 "value": 55700
              }
           },
           {
              "key": "samsung",
              "doc_count": 2,
              "total_prodsales_agg": {
                 "value": 12000
              }
           },
           {
              "key": "dell",
              "doc_count": 1,
              "total_prodsales_agg": {
                 "value": 4600
              }
           },
           {
              "key": "nokia",
              "doc_count": 1,
              "total_prodsales_agg": {
                 "value": 12000
              }
           },
           {
              "key": "sony",
              "doc_count": 1,
              "total_prodsales_agg": {
                 "value": 24000
              }
           }
        ]
     }
  }
}

Even if you don’t understand the syntax yet, you can see how easy it is to perform complex aggregations and groupings. There is virtually no limit on what kind of data you can extract with aggregations.

Let’s consider another example for a different type of user: a user that needs the yearly stats of product releases and a count of average product sales. Below, we nest the avg aggregation inside of the terms aggregation, which nests within a date_histogram aggregation. A date_histogram aggregation is a bucket aggregation that creates date buckets in a time interval that we specify. The average aggregation is a metric aggregation that just calculates average metrics on the matching document context.

POST /ecomercedata/gadgets/_search
{
"size":0,
    "by_year": {
      "date_histogram": {
         "field": "dateOfRelease",
         "interval": "year"
       },
       "aggregations": {
          "product_releases": {
             "terms": {
                "field": "name"
             },
             "aggs": {
                "average_agg": {
                   " avg": {
                      "field": "piecesSold"
                   }
                }
             }
         }
     }
}

The results of this aggregation would be:

 

{
"took": 571,
"timed_out": false,
"_shards": {
   "total": 1,
   "successful": 1,
   "failed": 0
},
"hits": {
   "total": 9,
   "max_score": 0,
   "hits": []
},
"aggregations": {
   "by_year": {
      "buckets": [
         {
            "key_as_string": "2002-00-01",
             "key": 1009843200000,
            "doc_count": 1,
             "product_releases": {
               "buckets": [
                   {
                      "key": "iphone",
                     "doc_count": 1,
                      "average_agg": {
                          "value": 28000
                     }
                  }
               ]
            }
         },
         {
            "key_as_string": "2004-00-01",
            "key": 1072915200000,
            "doc_count": 1,
            "product_releases": {
               "buckets": [
                  {
                     "key": "xperia",
                     "doc_count": 1,
                     "average_agg": {
                        "value": 24000
                     }
                  }
               ]
            }
         },
         {
            "key_as_string": "2005-00-01",
            "key": 1104537600000,
            "doc_count": 2,
            "product_releases": {
               "buckets": [
                  {
                     "key": "ipad",
                     "doc_count": 1,
                     "average_agg": {
                        "value": 9500
                     }
                  },
                  {
                     "key": "macbookpro",
                     "doc_count": 1,
                     "average_agg": {
                        "value": 9500
                     }
                  }
               ]
            }
         },
         {
            "key_as_string": "2006-00-01",
            "key": 1136073600000,
            "doc_count": 1,
            "product_releases": {
               "buckets": [
                  {
                     "key": "macbookair",
                     "doc_count": 1,
                     "average_agg": {
                        "value": 8700
                     }
                  }
               ]
            }
         },
         {
            "key_as_string": "2007-00-01",
            "key": 1167609600000,
            "doc_count": 1,
            "product_releases": {
               "buckets": [
                  {
                     "key": "galaxytab",
                     "doc_count": 1,
                     "average_agg": {
                        "value": 8500
                     }
                  }
               ]
            }
         },
         {
            "key_as_string": "2008-00-01",
            "key": 1199145600000,
            "doc_count": 1,
            "product_releases": {
               "buckets": [
                  {
                     "key": "inspiron",
                     "doc_count": 1,
                     "average_agg": {
                        "value": 4600
                     }
                  }
               ]
            }
         },
         {
            "key_as_string": "2009-00-01",
            "key": 1230768000000,
            "doc_count": 1,
            "product_releases": {
               "buckets": [
                  {
                     "key": "lumia",
                     "doc_count": 1,
                     "average_agg": {
                        "value": 12000
                     }
                  }
               ]
            }
         },
         {
            "key_as_string": "2014-00-01",
            "key": 1388534400000,
            "doc_count": 1,
            "product_releases": {
               "buckets": [
                  {
                     "key": "ativbook",
                     "doc_count": 1,
                     "average_agg": {
                        "value": 3500
                     }
                  }
               ]
            }
         }
      ]
   }
}
}

Our focus here has been on two important aspects of e-commerce platforms-aggregations and analytics. After you index your data in Elasticsearch, it’s no longer merely a search platform. You also get the power and flexibility of performing ad-hoc queries on your data. We’ve deliberately kept these examples simple, so that you can use them as springboards for implementing your own ideas and those of your users. The possibilities are huge and we hope to depict more Elasticsearch features in future.