In this blog post, we are going to explain Elasticsearch memory settings in detail. These settings can be used to improve Elasticsearch performance, especially during the heavy load times and application scaling. We also go over issues caused by poor memory settings, and the ways to fix them.  

Mlockall

Under heavy load, the processes use swap memory, which resides in the hard disk. This makes the processes and the related fetch operations quite slow. This phenomenon affects Elasticsearch processes as well. Hence, to gain on performance and stability in Elasticsearch,  it is recommended to disable swap memory. This can be achieved by tweaking two settings:

  1. Changing the "vm.swappiness" variable in "/etc/sysctl.conf". The swappiness variable controls the tendency of the kernal to move processes out of RAM to swap memory that resides in the hard disk. Since a hard disk is much slower than RAM, requests to swap memory will have longer response times, which will negatively affect Elasticsearch performance. The "vm.swappiness" variable, when set to 0, won't take swap memory into consideration even when the RAM is full. If you want swap to be disabled globally, this setting should be enabled on all Elasticsearch nodes.

  2. Changing the value of "bootstrap.mlockall" variable in "elasticsearch.yml" file to "false". This will lock RAM memory for Elasticsearch and prevent the swap usage. To take effect, you should edit this setting on all nodes of the Elasticsearch cluster and restart Elasticsearch. To check the value of mlockall, we can use the following command:

curl http://localhost:9200/_nodes/process?pretty

If the value of "mlockall" field is "true" in the response, the changes has been reflected in configuration.

File Descriptors

The Unix-based OSs have a limit for a number of files a process can access during its execution. The default values of this limit are lower than what Elasticsearch requires, and there are chances to hit error because of this. Setting a sufficiently high value for file descriptors can help us overcome this issue.

First, to view current file descriptor limit, use the following command:

sysctl fs.file-max

Sometimes, this value will be as low as 16000 or below, depending on the Unix OS you use for Elasticsearch installation. Low values for this limit might cause problems for Elasticsearch in accessing files or saving data. If the limit is hit, we are likely to receive a "too many files open" error. In order to avoid such a scenario, Elasticsearch recommended value for the file descriptor limit is 64000 at minimum. 

The changes can be made by setting the "fs.file-max=64000" in the "/etc/sysctl.conf" file.

Memory Map Count

Operating systems also impose a limit on the access of memory map areas for a single process. Similarly to the file descriptors, if this value is too low, the OS might return errors to Elasticsearch requests. The setting for this parameter resides in the "vm.max_map_count" variable in the "/etc/sysctl.conf"  configuration file. The recommended value for the "vm.max_map_count" parameter is 262,144. Any value below this might cause undesired errors and performance degradation.

Heap Memory

Efficient heap memory management is a crucial prerequisite for the successful deployment of Elasticsearch. Two major things to keep in mind when configuring heap memory are the following:

  1. Heap memory should not be more than 50% of the total available RAM.
  2. The maximum memory that can be allocated for heap is 32GB.

On many occasions, such as the indexing of very large number of files, or when dealing with very large number of requests, Elasticsearch gets overloaded, which might cause many performance issues. The root cause of these performance issues is often related to wrong heap memory setting.

In order to have a good understanding of heap memory management, you need to be familiar with the mechanism of garbage collection in Java, the programming language in which Elasticsearch is written. Java objects reside in the area called heap memory. When Java Virtual Machine (JVM) starts, it creates a heap space and manages its size by increasing or decreasing it as the application runs. JVM employs an automated process called garbage collection for the allocation and deallocation of objects in the heap.

When the load on heap memory is low, garbage collection is handled very fast and the delays in cleansing objects are negligible. This is not the case when the load increases. As the size of the heap grows, the garbage collection process takes significantly longer to clean up objects residing in memory. 

When garbage collection is happening and if there is other data entering the heap at a significantly higher rate than the cleanup process, the chances of heap memory overflow are high. When the overflow happens, the process might become very slow of fail completely. 

The maximum memory can be set using the ES_HEAP_SIZE environment variable. There are a few general guidelines in setting the heap size for Elasticsearch:

  1. It should not be more than 50% of the total available RAM. Since the filesystem caches are extensively used by Lucene,  memory shortage might adversely affect Elasticsearch performance.
  2. The maximum memory that can be allocated for heap is 32GB. If heap size exceeds 32GB, pointers occupy double the space, and less memory will be available for operations, which eventually results in performance degradation.   

Fine Tuning the Memory

The heap is divided into several segments. Fine-tuning of these heap segments also results in the dramatical performance improvement. For example, allocating more space to "young generation" and keeping "young" to "old" generation memory ratio at 1:4 is a recommended practice for Elasticsearch deployment.

To understand how this works, let's take a quick look at the heap memory structure:

young-old-es.png#asset:1531

As you see in the picture above, heap memory consists of two major segments: "young" generation and "old" generation.  The "young" generation is a part of the heap which allocates memory to newly created objects. When the size of this segment decreases, garbage collection process comes into play moving some of the "young" objects to the "old" generation space. The young/old ratio tells JVM how to distribute the available memory space between "young" and "old" generations of objects. The default  ratio is 2 in JVM. For better performance, it is ideal to have a larger "young generation" section, which will prevent frequent garbage collection. 

Node Memory Statistics and Tuning

We can employ the “_node/stats” API to find out how memory allocation is configured on the node. Based on this data, we can then make necessary changes, like setting the old generation to young generation ratio to 1:4.

For example, to check the young/old ratio, we can run the following command:

curl <IP:PORT>/_nodes/stats?pretty

It should return something like this:

"jvm" : {
        "timestamp" : 1533120408135,
        "uptime_in_millis" : 519101411,
        "mem" : {
          "heap_used_in_bytes" : 182285008,
          "heap_used_percent" : 35,
          "heap_committed_in_bytes" : 518979584,
          "heap_max_in_bytes" : 518979584,
          "non_heap_used_in_bytes" : 114021160,
          "non_heap_committed_in_bytes" : 120602624,
          "pools" : {
            "young" : {
              "used_in_bytes" : 39617792,
              "max_in_bytes" : 143130624,
              "peak_used_in_bytes" : 143130624,
              "peak_max_in_bytes" : 143130624
            },
            "survivor" : {
              "used_in_bytes" : 180104,
              "max_in_bytes" : 17891328,
              "peak_used_in_bytes" : 17891320,
              "peak_max_in_bytes" : 17891328
            },
            "old" : {
              "used_in_bytes" : 142487112,
              "max_in_bytes" : 357957632,
              "peak_used_in_bytes" : 142487112,
              "peak_max_in_bytes" : 357957632
            }
          }
        }

In this response, we can see the space occupied by "young" and "old" generations. If the current ratio is too small, we can increase the size of the "young" generation in the Elasticsearch startup script by setting the variable ES_HEAP_NEWSIZE  (e.g ES_HEAP_NEWSIZE=1g) . Alternatively, we can set the ratio from the JVM options using the “New Ratio” parameter. For example, setting the JVM opts to -XX:NewRatio=4 will make the ratio between the "old" and "young" generation 1:4.

Garbage Collection Issues

Inefficient garbage collection in a node can lead to several problems like node removal or unavailability. The log output below illustrates these issues:

[2016-10-28 06:41:59,043][INFO ][cluster.service ] [qbox-node5] removed {{qbox-node1}{I9CzLoWAQcGA-H6woQ593Q}{10.8.4.XXX}{10.8.4.XXX:9300},}, reason: zen-disco-receive(from master [{qbox-node2}{5egKJURrT4ewmbTv9hTZZw}{10.3.212.141}{10.3.XXX.XXX:9300}

It tells that one node has been removed from the cluster.  One reason for this is when a node is continuously refusing ping requests from the master. In its turn, this might happen due to a variety of reasons one of which is the high value of "stop-the-world" time in Java. 

The most common cause for STW pauses is the garbage collection issue. To confirm this hypothesis, let's have a quick look at the logs of node that has left the cluster (it is a node named "qbox-node5"). Here are the logs:

[2016-10-28 00:09:15,506][INFO ][monitor.jvm ] [qbox-node5] [gc][young][580391][1438] duration [747ms], collections [1]/[1.2s], total [747ms]/[2.1m], memory [18.3gb]->[17.9gb]/[29.8gb], all_pools {[young] [807.7mb]->[18.5mb]/[865.3mb]}{[survivor] [108.1mb]->[108.1mb]/[108.1mb]}{[old] [17.4gb]->[17.8gb]/[28.9gb]}

From the logs above, we can deduce that there is a total memory of 29.8 GB being allocated. You can also see that the young generation's  garbage collection was done effectively from 807.7 MB to 18.5 MB and there was a significant cleaning up taking place there. 

However, for the survivor and the old generation, there was not much memory cleaning even after garbage collection process, which caused the application thread to pause. This pause is the reason for the node's failure to respond to the master's pings. If this happens for a long interval of time, the node leaves the cluster and an error message is fired.

Give it a Whirl!

It's easy to spin up a standard hosted Elasticsearch cluster on any of our 47 Rackspace, Softlayer, or Amazon data centers. And you can now provision your own AWS Credits on Qbox Private Hosted Elasticsearch

Questions? Drop us a note, and we'll get you a prompt response.

Not yet enjoying the benefits of a hosted ELK-stack enterprise search on Qbox? We invite you to create an account today and discover how easy it is to manage and scale your Elasticsearch environment in our cloud hosting service.