In the last article we explained how the simple moving average pipeline aggregation worked and how to analyze the trends of the stock values of a firm with it. In this post, we discuss more moving average aggregation models and their differences in detail.  We use the same data set as in the previous post.

There are two models used for computing moving averages: Simple and Weighted. As we move into data sets of higher complexity, trends analysis will be more accurate if we incorporate weights to the recent values. Weighted models of moving averages assigns weight to each value of the data set according to its age. There are several types of weighted moving averages:

Linear Moving Average

In the linear moving average model, data points are assigned a linear weightage. This means that in the window of data points considered, older data points are assigned a linearly less significant value than the recent ones.

Tutorial: How to Set Up An Auto-Scaling Kubernetes Cluster on AWS EC2 with Supergiant

In this example we reuse the query used to calculate the simple moving average. Use it to calculate the linear moving average and plot the difference between the two:

{
  "size": 0,
  "aggs": {
    "hourly_data": {
      "date_histogram": {
        "field": "time",
        "interval": "hour"
      },
      "aggs": {
        "stock_value": {
          "sum": {
            "field": "stockValue"
          }
        },
        "mva_simple": {
          "moving_avg": {
            "buckets_path": "stock_value",
            "window": 10,
            "model": "simple"
          }
        },
        "mva_linear": {
          "moving_avg": {
            "buckets_path": "stock_value",
            "window": 10,
            "model": "linear"
          }
        }
      }
    }
  }
}

The response from the query has three sections in the aggregations buckets, like below:

{
  "key_as_string": "2016-04-01T01:00:00.000Z",
  "key": 1459472400000,
  "doc_count": 1,
  "stock_value": {
    "value": 34
  },
  "mva_simple": {
    "value": 96
  },
  "mva_linear": {
    "value": 48
  }
}

Plotting these values in a single graph will yield us the following graph:

mvgavg1.png#asset:1065

As you can see, the linear moving average curve, though following a similar curve to that of the simple moving average, displays a slight shifting of the linear curve downwards. This is due to the weight given to the recent values.

If we increase the window size for the linear aggregation, the low frequency fluctuations in the original series will be better reflected. However, the high frequency ones will be smoothed.

Exponentially Weighted Moving Average

The exponentially weighted moving average uses the same methodology of that of the linear moving average. However, the dependency on old data points decreases exponentially as compared to the linear dependency in the linear moving average model.

Learn: How to Calculate Kubernetes Cost Savings

The rate at which the dependency decay on old data points increase can be controlled by the parameter "alpha" given to the aggregation. Smaller values for the parameter makes the decay slower while the higher value makes it higher.

Here is the query part for the exponential weighted moving average aggregation with the "alpha" value set to a moderate value of 0.3

"mva_ewma": {
  "moving_avg": {
    "buckets_path": "stock_value",
    "window": 10,
    "model": "ewma",
    "settings": {
      "alpha": 0.3
    }
  }
}

If we plot the result of this query with the actual values to a graph, as we did above, it would look like this:

mvgage2.png#asset:1064

As we increase the value of "alpha", the "ewma" curve will become closer to the original curve with the lag also decreasing as the dependency to the older values are exponentially icreasing.

Holt Moving Average

In the previous models of the moving average aggregations, we have assumed that there were no trends in the data points given. We have applied the simple, linear, and the exponentially weighted models to the data and generated a linear trend data from them.

What happens if we are having short term trends in the given data. Short term trends are cyclic patterns which occur periodically and distinctly in data? With the previous models, there are chances that these cyclic patters would be not properly identified since they compute linear trends, especially if the trends are needed to be forecasted more than a single period ahead.

Tutorial: Deploy a MongoDB Replica Set with Docker and Supergiant

In such cases we use the Holt model of moving averages. Here we have two parameters. "Alpha", like the EWMA, calculates the levels. The "Beta" parameter is used to estimate local trend values.
As the "beta" parameter increases, it assumes that the local trends are changing more rapidly. As the "beta" value decreases, the local trends are assumed to be changing slowly.

The query for the "Holt" model with "alpha" and "beta" values of 0.5 is given below:

"mva_holt": {
  "moving_avg": {
    "buckets_path": "stock_value",
    "window": 10,
    "model": "holt",
    "settings": {
      "alpha": 0.5,
      "beta": 0.5
    }
  }
}

When the output of this aggregation is plotted against the original stock values and the ewma model, it will look like:

movavg3.png#asset:1063

You can see that the plot for the Holt is moving much closer to the lower values of the original stock values than that of the EWMA, indicating its attention to the local trends involved.

Holt-Winters Moving Average

While the normal Holt model covers local trends, it doesn’t address the issue of the seasonal trends that might be hidden in the data. The Holt-Winters model is used to accommodate seasonal trends. This essentially means that it closely follows the data trends.

For the seasonal trend incorporation we have to add two parameters: "Gamma", which is the smoothing parameter for seasonal data and the "Period" option, which specifies the periodicity of the data. In the following example, I use a "period" of 5 with a window size of 10 and "gamma" value equal to 0.5:

"mva_holtWinters": {
  "moving_avg": {
    "buckets_path": "stock_value",
    "window": 14,
    "model": "holt_winters",
    "settings": {
      "type": "add",
      "alpha": 0.5,
      "beta": 0.5,
      "gamma": 0.5,
      "period": 7
    }
  }
}

The output should look like below:

mvgavg4.png#asset:1066You should notice the Holt-winters start only after a certain time. This is because the Holt-Winters plot requires data of at least 2 periods to be computed. The Holt-Winters plot closely follows that of the original stock values, as it is intended to be. This closeness can be used to predict the trends, but in Elasticsearch, the prediction feature for Holt-Winters is currently not available.

Conclusion

In this last installment of our pipeline aggregations series, we have seen different types of the moving average aggregations and we have plotted each of them for better understanding. Questions / Comments? Drop us a line below.