Search and Analytics are key features of modern software applications. Scalability and the capability to handle large volumes of data in near real-time is demanded by many applications such as mobile apps, web and data analytics applications. Autocomplete in text fields, Search suggestions, Location or Geospatial search and Faceted Navigation are standards in usability to meet business requirements nowadays.

Tuning is essential, necessary and crucial! Any system tuning must be supported by performance measurements; that’s why a clear understanding of monitoring and the implications of changed metrics is essential for anyone using Elasticsearch.

This three part tutorial series introduces some tips and methods for performance tuning, explaining at each step the most relevant system configuration settings and metrics.

Cluster State Size (Capacity Planning for Index and Shard)

If 1 shard is too few and 1,000 shards are too many, how do I know how many shards I need? This is a question that is impossible to answer in the general case. There are just too many variables: the hardware that we use, the size and complexity of our documents, how we index and analyze those documents, the types of queries that we run, the aggregations that we perform, how we model our data, and more.

It's very easy to create a lot of indices and lots and lots of shards in elasticsearch, but creation of each index and shard comes at a cost. The management load alone can degrade our ES cluster performance, potentially to the point of making it red, if we have too many indices or shards. Overloading elasticsearch with too many indices/shards can also have pretty significant impact on the performance of indexing and search.

Elasticsearch provides the flexibility of being schemaless. We can add a JSON document with any number of fields to an index without first telling ES anything about what those fields are. These new fields—what they’re named, what type they are, and what index they live in—are automatically added to Elasticsearch’s index mapping and cluster state. The biggest bang to management overhead is the size of the Cluster State, which contains all of the mappings for every index in the cluster.

The Cluster state API allows to get a comprehensive state information of the whole cluster.

$ curl -XGET 'http://localhost:9200/_cluster/state'

The API response for a single node cluster with 3 templates and 5 indices will be something like this:

{
  "cluster_name": "elasticsearch",
  "version": 6,
  "state_uuid": "skxF0gCYTAGQAUU-ZW4_GQ",
  "master_node": "VyKDGurkQiygV-of4B1ZAQ",
  "blocks": {},
  "nodes": {
    "VyKDGurkQiygV-of4B1ZAQ": {
      "name": "Siege",
      "transport_address": "127.0.0.1:9300",
      "attributes": {}
    }
  },
  "metadata": {
    "cluster_uuid": "QMSq7EOfToS-v5dKc0GdUA",
    "templates": {
      "template_one": {
        "template": "template_one_*",
        "order": 0,
        "settings": { ... },
        "mappings": { ... }
      },
      "template_two": { ... },
      "template_three": { ... }
    },
    "indices": {
      "index_one": {
        "state": "open",
        "settings": { ... },
        "mappings": { ... },
        "aliases": [ ... ]
      },
      "index_2": { ... },
      "index_3": { ... },
      "index_4": { ... },
      "index_5": { ... }
    }
  },
  "routing_table": {
    "indices": {
      "index_one": {
        "shards": {
          "0": [
            {
              "state": "STARTED",
              "primary": true,
              "node": "VyKDGurkQiygV-of4B1ZAQ",
              "relocating_node": null,
              "shard": 0,
              "index": "index_one",
              "version": 18,
              "allocation_id": {
                "id": "hAZf59wDSNS7im1wOdToHA"
              }
            }
          ]
        }
      },
      "index_two": { ... },
      "index_three": { ... },
      "index_four": { ... },
      "index_five": { ... } 
    }
  },
  "routing_nodes": {
    "unassigned": [
      {
        "state": "UNASSIGNED",
        "primary": true,
        "node": null,
        "relocating_node": null,
        "shard": 0,
        "index": "index_three",
        "version": 0,
        "unassigned_info": {
          "reason": "CLUSTER_RECOVERED",
          "at": "2017-02-08T18:46:11.027Z"
        }
      },
      { ... }
    ],
    "nodes": {
      "VyKDGurkQiygV-of4B1ZAQ": [
        {
          "state": "STARTED",
          "primary": true,
          "node": "VyKDGurkQiygV-of4B1ZAQ",
          "relocating_node": null,
          "shard": 0,
          "index": "index_one",
          "version": 18,
          "allocation_id": {
            "id": "hAZf59wDSNS7im1wOdToHA"
          }
        },
        { ... },
        { ... }
      ]
    }
  }
}

By default, the cluster state request is routed to the master node, to ensure that the latest cluster state is returned. For debugging purposes, we can retrieve the cluster state local to a particular node by adding local=true to the query string.

As the cluster state can grow (depending on the number of shards and indices, our mapping, templates), it is possible to filter the cluster state response specifying the parts in the URL.

$ curl -XGET 'http://localhost:9200/_cluster/state/{metrics}/{indices}'

Metrics can be a comma-separated list of

  • version- Shows the cluster state version.

  • master_node - Shows the elected master_node part of the response

  • nodes - Shows the nodes part of the response

  • routing_table - Shows the routing_table part of the response. If we supply a comma separated list of indices, the returned output will only contain the indices listed.

  • metadata - Shows the metadata part of the response. If we supply a comma separated list of indices, the returned output will only contain the indices listed.

  • blocks - Shows the blocks part of the response.

We can plan for cluster state, size or capacity in order to decide the number of primary shards as follows:

  • Create a cluster consisting of a single server, with the hardware that we are considering using in production.

  • Create an index with the same settings and analyzers that we plan to use in production, but with only one primary shard and no replicas.

  • Fill it with real documents (or as close to real as we can get).

  • Run real queries and aggregations (or as close to real as we can get).

Once we define the capacity of a single shard, it is easy to extrapolate that number to our whole index. Take the total amount of data that we need to index, plus some extra for future growth, and divide by the capacity of a single shard. The result is the number of primary shards that we will need.

Life Inside an Elasticsearch Cluster (Topology)

Elasticsearch provides a pretty large toolbox for composing complex cluster topologies. We can make heterogeneous clusters with beefy nodes hosting our hot indices, and have less expensive nodes host historical data, e.g. using node attributes and shard allocation filtering. Different nodes in a cluster can have different roles (data and/or master – or none as a client) as well as properties (such as zone). An advantage of splitting the master and data roles between dedicated nodes is that we can have just three master-eligible nodes and set minimum_master_nodes to 2. We never have to change this setting, no matter how many dedicated data nodes we add to the cluster. Setting up a proper cluster topology helps avoid problems like split brain.

Master Eligible Node

The master node is responsible for lightweight cluster-wide actions such as creating or deleting an index, tracking which nodes are part of the cluster, and deciding which shards to allocate to which nodes. It is important for cluster health to have a stable master node. Master nodes must have access to the data/ directory (just like data nodes) as this is where the cluster state is persisted between node restarts.

To create a standalone master-eligible node, set:

node.master: true 
node.data: false 
node.ingest: false
  • The node.master role is enabled by default.

  • Disable the node.data role (enabled by default).

  • Disable the node.ingest role (enabled by default).

Data Node

Data nodes hold the shards that contain the documents we have indexed. Data nodes handle data related operations like CRUD, search, and aggregations. These operations are I/O-, memory-, and CPU-intensive. It is important to monitor these resources and to add more data nodes if they are overloaded. The main benefit of having dedicated data nodes is the separation of the master and data roles.

To create a dedicated data node, set:

node.master: false 
node.data: true 
node.ingest: false
  • Disable the node.master role (enabled by default).

  • The node.data role is enabled by default.

  • Disable the node.ingest role (enabled by default).

Ingest Node

Ingest nodes can execute pre-processing pipelines, composed of one or more ingest processors. Depending on the type of operations performed by the ingest processors and the required resources, it may make sense to have dedicated ingest nodes, that will only perform this specific task.

To create a dedicated ingest node, set:

node.master: false 
node.data: false 
node.ingest: true
  • Disable the node.master role (enabled by default).

  • Disable the node.data role (enabled by default).

  • The node.ingest role is enabled by default.

Coordinating Only Node

If we take away the ability to be able to handle master duties, to hold data, and pre-process documents, then we are left with a coordinating node that can only route requests, handle the search reduce phase, and distribute bulk indexing. Essentially, coordinating only nodes behave as smart load balancers.

To create a coordinating only node, set:

node.master: false 
node.data: false 
node.ingest: false
  • Disable the node.master role (enabled by default).

  • Disable the node.data role (enabled by default).

  • Disable the node.ingest role (enabled by default).

Enable Memory Lock Check to Disable Swapping

Swapping is the process whereby a page of memory is copied to the preconfigured space on the hard disk, called swap space, to free up that page of memory. The combined sizes of the physical memory and the swap space is the amount of virtual memory available.

Swapping is very bad for performance and for node stability and should be avoided at all costs. It can cause garbage collections to last for minutes instead of milliseconds and can cause nodes to respond slowly or even to disconnect from the cluster. Compared to memory, disks are very slow. Memory speeds can be measured in nanoseconds, while disks are measured in milliseconds; so accessing the disk can be tens of thousands times slower than accessing physical memory. The more swapping that occurs, the slower our process will be, so we should avoid swapping at all cost.

When the JVM does a major garbage collection it touches every page of the heap. If any of those pages are swapped out to disk they will have to be swapped back into memory. That causes lots of disk thrashing that Elasticsearch would much rather use to service requests. One of the several ways to disallow swapping is by requesting the JVM to lock the heap in memory through mlockall via Elasticsearch setting bootstrap.mlockall. However, there are cases where Elasticsearch fails to lock the heap (e.g., if the elasticsearch user does not have memlock unlimited). The memory lock check can be used to verify if the bootstrap.mlockall setting is enabled or to confirm if the JVM was is successfully able to lock the heap.

There are three approaches to disabling swapping:

Enable bootstrap.mlockall

The mlockall property in ES allows the ES node not to swap its memory (Note that it is available only for Linux/Unix systems). This property can be set in the config/elasticsearch.yml file by adding the following line.

bootstrap.mlockall: true

In the 5.x releases, this has changed to bootstrap.memory_lock: true.

mlockall is set to false by default, meaning that the ES node will allow swapping. Once we add this value to the property file, we need to restart our ES node. After restarting Elasticsearch, we can see whether this setting was applied successfully by checking the value of mlockall in the output from this request:

curl -XGET localhost:9200/_nodes?filter_path=**.mlockall

The response should be like

{"nodes":{"VyKDGurkQiygV-of4B1ZAQ":{"process":{"mlockall":true}}}}

If we see that mlockall is false, then it means that the mlockall request has failed. We will also see a line with more information in the logs with the words Unable to lock JVM Memory. The most probable reason, on Linux/Unix systems, is that the user running Elasticsearch doesn’t have permission to lock memory. This can be granted as follows:

  • Setting ulimit -l unlimited as root before starting Elasticsearch, or set memlock to unlimited in /etc/security/limits.conf.

  • Setting MAX_LOCKED_MEMORY to unlimited in the system configuration file (or see below for systems using systemd), if using RPM or Debian systems.

  • Setting LimitMEMLOCK to infinity in the systemd configuration, if using systemd Systems.

If we are setting mlockall property, we must ensure that we are giving enough memory to the ES node using the -DXmx option or ES_HEAP_SIZE.

The default installation of Elasticsearch is configured with a 1 GB heap. If we are using the default heap values, our cluster is probably configured incorrectly. The easiest way to change heap size is to set an environment variable called ES_HEAP_SIZE. When the server process starts, it will read this environment variable and set the heap accordingly. As an example, we can set it via the preferred command line as follows:

export ES_HEAP_SIZE=10g

Alternatively, we can pass in the heap size via JVM flags when starting the process :

ES_JAVA_OPTS="-Xms10g -Xmx10g" ./bin/elasticsearch 

Ensure that the min (Xms) and max (Xmx) sizes are the same to prevent the heap from resizing at runtime, a very costly process. Generally, setting the ES_HEAP_SIZE environment variable is preferred over setting explicit -Xmx and -Xms values.

Disable All Swap Files

Alternately, we can completely disable swap. Usually Elasticsearch is the only service running on a box, and its memory usage is controlled by the JVM options. There should be no need to have swap enabled. On Linux systems, we can disable swap temporarily by running: sudo swapoff -a. To disable it permanently, we will need to edit the /etc/fstab file and comment out any lines that contain the word swap.

Configure Swappiness

Another option available on Linux systems is to ensure that the sysctl value vm.swappiness is set to 1. This reduces the kernel’s tendency to swap and should not lead to swapping under normal circumstances, while still allowing the whole system to swap in emergency conditions.

Give It a Whirl!

It's easy to spin up a standard hosted Elasticsearch cluster on any of our 47 Rackspace, Softlayer, Amazon or Microsoft Azure data centers. And you can now provision a replicated cluster.

Questions? Drop us a note, and we'll get you a prompt response.

Not yet enjoying the benefits of a hosted ELK-stack enterprise search on Qbox? We invite you to create an account today and discover how easy it is to manage and scale your Elasticsearch environment in our cloud hosting service.

comments powered by Disqus