We have seen numerous pipeline aggregations in previous posts. Here we discuss another pipeline aggregation called the moving average aggregation and its significance, as well as its application in real-life scenarios.

Moving Average Explained

A simple moving average can be calculated by taking the average price of a metric over a fixed number of periods. This generates the average values that moves, meaning that after the specified period, another average is calculated and hence the name moving average. Also, as it moves forward, the old values are dropped and new values are taken into account for calculating the average.

The moving average moves along a time scale as new values are added. Thus, they find profuse usage in trend calculation. Since the moving averages are calculated over a specific window, they are good indicators of trends. This calculation is extremely useful when we are dealing with data which does not appear to follow a trend. Let us explore the possibilities of the moving averages in the following examples. To see an example of how to calculate the moving averages you can refer here.

Data Set

For this example, it’s good to have some continuous data set for the evaluation of trends. I have created a sample data set consisting of hourly stock values for a period of 10 days, of a firm which is highly volatile in nature. The data set has a total of 240 documents with each document of the below structure:

{
  "stockValue": 99,
  "time": "2016-04-24T19:19:00+00:00"
}

Simple Moving Average

Let us explore the volatile stock data we have indexed in detail by plotting the values. The query to be used is given below:

{
  "size": 0,
  "aggs": {
    "hourly_data": {
      "date_histogram": {
        "field": "time",
        "interval": "hour"
      },
      "aggs": {
        "stock_value": {
          "sum": {
            "field": "stockValue"
          }
        }
      }
    }
  }
}

The above query will give us the stockValue per hour as the response. Plotting the above with respect to the time field will yield us the following graph:

agg1.png#asset:1048

As you can see, when plotting the values against the time, there is no indication of trends of where the stock values are going. This is where moving average aggregations come into play. Let us create a query which calculates the moving averages:

{
  "size": 0,
  "aggs": {
    "hourly_data": {
      "date_histogram": {
        "field": "time",
        "interval": "hour"
      },
      "aggs": {
        "stock_value": {
          "sum": {
            "field": "stockValue"
          }
        },
        "mva_demo": {
          "moving_avg": {
            "buckets_path": "stock_value",
            "window": 5,
            "model": "simple"
          }
        }
      }
    }
  }
}

In the above query, the aggregation which calculates the moving average is named "the_movingavg". Under it, apart from the "buckets_path", we have two parameters named "window" and "model". The "window" parameter is used to set the window size for calculating the averages. Here we have set the value as 5, which means it would take 5 values at a time and calculate the average and then move on the the next 5 dropping the first value.

Learn About Auto-Scaling Kubernetes Clusters on AWS EC2 with Supergiant

The "model" parameter indicates which model of the moving average are we using. The model we are using currently is the simple moving average, hence the value "simple".  The response is below, with both the value and the moving average values included in a single bucket.

{
  "key_as_string": "2016-04-01T01:00:00.000Z",
  "key": 1459472400000,
  "doc_count": 1,
  "stock_value": {
    "value": 34
  },
  "mva_demo": {
    "value": 96
  }
}

Now we plot a single graph containing both the values and the moving average values to understand what the difference was and it would look like the below one:

agg2.png#asset:1049

The orange line depicts the stockValues plotted and the yellow line shows the moving average values plotted. It can be observed that the initial value of the yellow line is zero. This is because there is not enough data for calculating the moving average of the very first point.

Tutorial: How to Install Ghost Node JS Blog with Docker on Supergiant

Now looking in to the graph closely, we can see the peaks of the yellow line (the moving average values) have been smoothed. However, it pretty much follows the same trend as the orange line. A quality trend analysis cannot be done based from this because we have chosen a small window size. Modify the window size to a greater value, say 50, and plot the graph again. The resultant graph would be like the one below:

agg3.png#asset:1050

From the above graph, the yellow line indicating the moving average values is generally on the rise towards the end of the graph, which indicates that there is an increasing trend in the stock value of the company.

Predictive Moving Average

Moving average aggregations also support prediction of future values. This can be done by adding a "predict" parameter and modifying the query as below:

"mva_demo": {
  "moving_avg": {
    "buckets_path": "stock_value",
    "window": 5,
    "model": "simple",
    "predict": 5
  }
}

As a response to this query above, we can see the last five buckets will contain the predicted values of the moving average aggregation, even though there are no original stock values to predict. The prediction values usually converges to the final mean value calculated.

Conclusion

In this article we have introduced a basic moving average aggregation and how it is used to understand the trends of stock data for a company. The next aggregations related post will discuss the types of moving average aggregations and the differences between them in detail.