Editors Note: This post is part 2 of a 3-part series on tuning Elasticsearch performance. Part 1 can be found here.

If we are using Elasticsearch mainly for search, or if search is a customer-facing feature that is key to our organization, we should monitor query latency and take action if it surpasses a threshold. 

It’s important to monitor relevant metrics about queries and fetches that can help us determine how our searches perform over time. For example, we may want to track cluster's health to provide high availability or track spikes and long-term increases in query requests, so that we can be prepared to tweak our configuration to optimize for better performance and reliability.

In this tutorial, we continue focusing on performance tuning strategies to keep our cluster's health green and improve overall performance.

For this post, we will be using hosted Elasticsearch on Qbox.io. You can sign up or launch your cluster here, or click "Get Started" in the header navigation. If you need help setting up, refer to "Provisioning a Qbox Elasticsearch Cluster."

Zen Discovery

Zen discovery is algorithm used by Elasticsearch to discover nodes when their state changes in the cluster. It does master election, fault detection, cluster state maintenance and publishing. It is the default mechanism used by Elasticsearch to discover and communicate between the nodes in the cluster. There are other discovery mechanisms for Azure, EC2 and GCE.

Elasticsearch is a peer to peer based system, nodes communicate with one another directly if operations are delegated / broadcast. All the main APIs (index, delete, search) do not communicate with the master node. The responsibility of the master node is to maintain the global cluster state, and act if nodes join or leave the cluster by reassigning shards. Each time a cluster state is changed, the state is made known to the other nodes in the cluster.

The cluster.name allows to create separated clusters from one another. The default value for the cluster name is elasticsearch, though it is recommended to change this to reflect the logical group name of the cluster running.

In 0.x and 1.x releases both unicast and multicast are available, and multicast is the default. To use unicast with these versions or disable multicast, we can set discovery.zen.ping.multicast.enabled to false. From 2.0 onwards unicast is the only option available for Zen discovery.

Data and master nodes detect each other in two different ways:

  1. Ping - It is the process where a node uses the discovery mechanisms to find other nodes. The master node may ping all other nodes in the cluster to verify they are up and running
  2. Unicast - The unicast discovery requires a list of hosts to use that will act as gossip routers. It provides the setting discovery.zen.ping.unicast.hosts which is either an array setting or a comma delimited setting. Nodes may ping the master nodes to verify if they are up and running or if an election process needs to be initiated.

Zen Discovery configuration settings can be updated using Cluster Update Settings API and is responsible for:

Election of Master Node

The discovery.zen.ping_timeout (which defaults to 3s) allows for the tweaking of election time to handle cases of slow or congested networks (higher values assure less chance of failure). Once a node joins, it will send a join request to the master (discovery.zen.join_timeout) with a timeout defaulting at 20 times the ping timeout. If we are on slow network, set the value higher. The higher the value, the smaller the chance of discovery failure.

If discovery.zen.master_election.filter_client is true, pings from client nodes are ignored during master election; the default value is true. If discovery.zen.master_election.filter_data is true, pings from non-master-eligible data nodes (nodes where node.data is true and node.master is false) are ignored during master election; the default value is false.

The discovery.zen.minimum_master_nodes control the minimum number of eligible master nodes that a node should “see” in order to operate within the cluster. It’s recommended that we set it to a higher value than 1 when running more than 2 nodes in the cluster.  One way to calculate value for this will be N/2 + 1 where N is number of master nodes. This setting must be set to a quorum of our master eligible nodes. It is recommended to avoid having only two master eligible nodes, since a quorum of two is two. Therefore, a loss of either master eligible node will result in an inoperable cluster.

Fault Detection

There are two fault detection processes running. The first is by the master, to ping all the other nodes in the cluster and verify that they are alive. And on the other end, each node pings to master to verify if its still alive or an election process needs to be initiated.

The following settings control the fault detection process using the discovery.zen.fd prefix:

  • ping_interval - How often a node gets pinged, defaults to 1s.

  • ping_timeout - How long to wait for a ping response, defaults to 30s.

  • ping_retries - How many ping failures / timeouts cause a node to be considered failed, defaults to 3.

Publishing Cluster Updates to all Nodes

The master node is the only node in a cluster that can make changes to the cluster state. The master node processes one cluster state update at a time, applies the required changes and publishes the updated cluster state to all the other nodes in the cluster. Each node receives the published message, updates its own cluster state and replies to the master node, which waits for all nodes to respond, up to a timeout, before going ahead processing the next updates in the queue. The discovery.zen.publish_timeout is set by default to 30 seconds and can be changed dynamically through the cluster update settings api.

No Active Master Node

The discovery.zen.no_master_block settings controls what operations should be rejected when there is no active master. The discovery.zen.no_master_block setting has two valid options:

  1. all - All operations on the node—i.e. both read & writes—will be rejected.

  2. write - Write operations will be rejected (default). Read operations will succeed based on the last known cluster configuration.

The basic ES discovery.zen properties can be configured as follows:

discovery.zen.fd.ping_timeout: 10s
discovery.zen.minimum_master_nodes: 2
Discovery.zen.ping.unicast.hosts: ["master_node_01″,"master_node_02″,"master_node_03″]

The above setting discovery.zen.fd.ping_timeout says that node detection should happen within 10 seconds. The unicast hosts are “master_node_01″, “master_node_02″, “master_node_03″. In addition, two minimum master nodes need to join a newly elected master in order for an election to complete and for the elected node to accept its mastership. We have 3 master nodes.

Enable Doc Values or The Column-Store Compression

When we sort on a field, Elasticsearch needs access to the value of that field for every document that matches the query. The inverted index, which performs very well when searching, is not the ideal structure for sorting on field values:

When searching, we need to be able to map a term to a list of documents but when Sorting, aggregating, and accessing field values in scripts, we need to map a document to its terms. In other words, we need to “uninvert” the inverted index. This “uninverted” structure is often called a “column-store” and stores all the values for a single field together in a single column of data, which makes it very efficient for operations like sorting.

Doc values are the on-disk data structure, built at document index time, which makes this data access pattern possible. They store the same values as the _source but in a column-oriented fashion that is way more efficient for sorting and aggregations. 

Blog Post - Kubernetes Series: Understanding Why Container Architecture is Important to the Future of Your Business

Doc values are supported on almost all field types, with the notable exception of analyzed string fields. Doc values are generated at index-time, alongside the creation of the inverted index. That means doc values are generated on a per-segment basis and are immutable, just like the inverted index used for search. And, like the inverted index, doc values are serialized to disk. This is important to performance and scalability.

By serializing a persistent data structure to disk, we can rely on the OS’s file system cache to manage memory instead of retaining structures on the JVM heap. In situations where the "working set" of data is smaller than the available memory, the OS will naturally keep the doc values resident in memory. This gives the same performance profile as on-heap data structures. 

But when our working set is much larger than available memory, the OS will begin paging the doc values on/off disk as required. This will obviously be slower than an entirely memory-resident data structure, but it has the advantage of scaling well beyond the server’s memory capacity. If these data structures were purely on-heap, the only option is to crash with an OutOfMemory exception (or implement a paging scheme just like the OS).

All fields which support doc values have them enabled by default. If we are sure that we don’t need to sort or aggregate on a field, or access the field value from a script, we can disable doc values in order to save disk space:

To disable doc values, set doc_values: false in the field’s mapping. For example, here we create a new index where doc values are disabled for the "user_id" field. By setting doc_values: false, this field will not be usable in aggregations, sorts or scripts.

curl -XPUT localhost:9200/my_index -d '{
  "mappings": {
    "my_type": {
      "properties": {
        "user_id": {
          "type": "string",
          "index": "not_analyzed",
          "doc_values": false 
        }
      }
    }
  }
}'

It is possible to configure the inverse relationship too: make a field available for aggregations via doc values, but make it unavailable for normal search by disabling the inverted index. For example, here doc values are enabled to allow aggregations but Indexing is disabled, which makes the field unavailable to queries/searches.

curl -XPUT localhost:9200/my_index -d'{
  "mappings": {
    "my_type": {
      "properties": {
        "user_id": {
          "type": "string",
          "index": "not_analyzed",
          "doc_values": true, 
          "index": "no" 
        }
      }
    }
  }
}'

Doc values are used in several places in Elasticsearch:

  • Sorting on a field;

  • Aggregations on a field;

  • Certain filters (for example, geolocation filters);

  • Scripts that refer to fields.

Disable or Beware of DELETE_all

The delete index API allows you to delete an existing index.

curl -XDELETE 'localhost:9200/index_one/'

The above example deletes an index called index_one. Specifying an index, alias or wildcard expression is required. The delete index API can also be applied to more than one index, by either using a comma separated list, or on all indices (be careful!) by using _all or * as index.

Blog Post: Why Is the Supergiant Packing Algorithm Unique? How Does It Save Me Money?

In order to disable allowing to delete indices via wildcards or _all, set action.destructive_requires_name setting in the config to true. This setting can also be changed via the cluster update settings API. Settings updated can either be persistent (applied across restarts) or transient (will not survive a full cluster restart). Here is an example:

Updating persistent settings:

curl -XPUT localhost:9200/_cluster/settings -d '{
    "persistent" : {
        "action.destructive_requires_name" : true
    }
}'

Updating transient settings:

curl -XPUT localhost:9200/_cluster/settings -d '{
    "transient" : {
        "action.destructive_requires_name" : true
    }
}'

The cluster responds with the updated settings. The response for the last curl request will be:

{
    "persistent" : {},
    "transient" : {
        "action.destructive_requires_name" : "true"
    }
}'

Cluster wide settings can be returned using:

curl -XGET localhost:9200/_cluster/settings

Transient cluster settings take precedence over persistent cluster settings, which take precedence over settings configured in the elasticsearch.yml config file. For this reason it is preferable to use the elasticsearch.yml file only for local configurations, and set all cluster-wider settings with the settings API.

Disable or Minimize Refresh Mapping Requests to Master Node

The elasticsearch cluster might be flooded with a large number of refresh_mappings requests in its pending tasks queue if there are frequent changes in field mappings or addition of new fields to indices. This is not so crucial, but it can impact the performance of cluster. This problem is more commonly seen in pre-2.0 versions.

The configuration parameter can be used like this in elasticsearch config file (/etc/elasticsearch.yml):

  indices.cluster.send_refresh_mapping: false

What is this and how does it work?

When a node detects a new field while processing an index request, it updates its own mapping and sends that mapping to the master node. The new mapping might still be present in the master's pending task queue when the master sends out its next cluster state. In such a case, the data nodes will be receiving  an “old” version of the mapping. 

Though there is a conflict, it is not bad: i.e. the cluster state will eventually have the correct mapping. As far as the data node is concerned, the master has the older version of mapping and thus sends a refresh mapping request to the master.

A system that has extreme cases of updates and frequent mapping changes, there is a stampeding horde effect and it makes sense to disable this feature. The indices.cluster.send_refresh_mapping parameter allows us to disable the default sensible behavior, thus eliminating these refresh_mapping requests going from data node to master. Even if the send_refresh_mapping parameter is not configured, the master will eventually see the original mapping change, and send out an updated cluster state including that change.

Note, sending refresh mapping is more important when the reverse happens, and for some reason, the mapping in the master is ahead, or in conflict, with the actual parsing of it in the actual node the index exists on. In this case, the refresh mapping will result in warning being logged on the master node.

Give It a Whirl!

It's easy to spin up a standard hosted Elasticsearch cluster on any of our 47 Rackspace, Softlayer, Amazon or Microsoft Azure data centers. And you can now provision your own AWS Credits on Qbox Private Hosted Elasticsearch

Questions? Drop us a note, and we'll get you a prompt response.

Not yet enjoying the benefits of a hosted ELK-stack enterprise search on Qbox? We invite you to create an account today and discover how easy it is to manage and scale your Elasticsearch environment in our cloud hosting service.

comments powered by Disqus