Elasticsearch continues to evolve. The big news recently is that release 2.0 is around the corner. Pipeline aggregations is perhaps the most interesting feature set that will be available in this upcoming release. This will be an extension of the existing ES aggregations framework, and it will provide for a number of computation types that users can perform on top of the standard aggregations results.
In this article, we give a brief overview of this ES feature extension, direct you to tutorials on aggregations, and provide links to more information.
We are glad to see that the steady innovation of Elasticsearch continues. Release 2.0 is coming soon, with many new features built upon Lucene 5.2.1. Aggregations functionality in ES will expand to include pipeline aggregations—an impressive enhancement to a key feature.
Continue reading below for an overview of pipeline aggregations and links to more information. If you are unfamiliar with aggregations in Elastic, you can gain proficiency with this important feature by reading our collection of blog articles:
Pipeline Aggregations Coming in ES Release 2.0
A prominent feature set that will be available in the upcoming release 2.0 of Elasticsearch is pipeline aggregations. This will be an extension of the existing aggregations framework, and it will provide for a number of computation types that users can perform on top of the standard aggregations results.
With these pipeline aggregations, ES users can directly put questions such as “What is the maximum average weekly price?” against a simple date histogram. Or the user can directly ask of a date histogram showing total user count by weekly intervals: “How many new users are signing up each week?”
Elasticsearch will only process these aggregations after the other aggregations complete on the coordinating node. New entities known as pipeline aggregators will be able to use the final results of their sibling aggregations but will be unable to access the shards to query the index.
There are two types of pipeline aggregations: parent and sibling. We give an overview of each below and then direct you to more information.
Parent pipeline aggregations
- Derivative Aggregation – calculates the derivative of a specific metric in a parent histogram (or date_histogram) aggregation.
- Bucket Script Aggregation — executes a script that can perform per-bucket computations on specified metrics in the parent multi-bucket aggregation.
- Cumulative Sum Aggregation — calculates the cumulative sum of a specified metric in a parent histogram (or date_histogram) aggregation.
Sibling pipeline aggregations
- Max Bucket Aggregation – identifies the bucket(s) with the maximum value of a specified metric in a sibling aggregation and outputs both the value and the key(s) of the bucket(s).
- Min Bucket Aggregation — identifies the bucket(s) with the minimum value of a specified metric in a sibling aggregation and outputs both the value and the key(s) of the bucket(s).
- Average Bucket Aggregation — calculates the (mean) average value of a specified metric in a sibling aggregation.
- Sum Bucket Aggregation — calculates the sum across all bucket(s) of a specified metric in a sibling aggregation.
Release 2.0 will also include the Moving Average Aggregation. Given an ordered series of data, this aggregation will slide a window across the data and emit the average value of that window.
In his aggregations article on the ES blog, Colin Goodheart-Smith of Elastic steps through a nice tutorial that demonstrates how to use most of these new aggregations. He takes as his data set the spacecraft trajectory data from NASA’s Helioweb site for the Voyager 1 and Voyager 2 spacecraft. Each document in his index represents the solar ecliptic position of one of the two spacecraft for a particular day.
We’re always ready to help you achieve maximum success with your ES environment, and we hope that you found this article helpful. Stay tuned for more updates and technical bulletins.