In this blog post, we explain memory related settings in detail, which can be used to give elasticsearch better performance especially at times of scaling. We also go over issues caused by poor memory settings, and the ways to overcome them.  

Mlockall

Under heavy load, the processes use swap memory, which in turn is residing in the hard disk. This makes the processes and the related fetch operations slow. Hence, it is recommended to disable the swap memory to gain on performance and stability in Elasticsearch.

Sometimes under load, processes tend to use the swap memory. For Elasticsearch processes, this can be detrimental on performance and stability. In order to avoid such a scenario, the best practice is to completely avoid use of the swap memory. This can be achieved by tweaking two configurations:

  1. Changing the "vm.swappiness" variable in "/etc/sysctl.conf". The swappiness variable controls the tendency of the Kernal to move process out of RAM to the swap memory, which resides in the hard disk. Since the hard disk is much slower than the RAM, the requests to the swap will have longer response times and it will affect the Elasticsearch performance. The "vm.swappiness" variable, when set to 0, won't take swap memory into consideration even when the RAM is full, thus disabling it. This should be done in every Elasticsearch node.

  2. Changing the value of "bootstrap.mlockall" variable in "elasticsearch.yml" file to "false". This will lock the RAM memory for the Elasticsearch and prevents the swap usage. This change has to be made to all the nodes in the cluster and restart to take in effect. To check the value of mlockall in the nodes can be checked using the command:

curl<a href="http://localhost:9200/_nodes/process?pretty"> http://localhost:9200/_nodes/process?pretty</a>

In the response to the above command, if the value of "mlockall" field is "true" the changes has been reflected.

File Descriptors

The Unix-based operators have a limit for the number of files a process can access during the execution. The default values of this limit are lower than what Elasticsearch requires, and there are chances to hit error due to it. Setting a sufficiently high value of file descriptor value can help us overcome this difficulty.

There is a limit to the number of files a process can access or open in the Unix based operating systems. To view this limit we can use the following command:

"sysctl fs.file-max".

Sometimes this value will be as low as 16k or below, depending on the Unix OS you use for Elasticsearch installation. This creates limitations in accessing the files, while we try to access the data from the elasticsearch nodes and return error ("too many files open" error). In order to avoid such a scenario, the Elasticsearch recommended value for this is a minimum of 64k

The changes can be made by setting the "fs.file-max=64000" in the "/etc/sysctl.conf" file.

Memory Map Count

The operating systems also have imposed a limit on the access of memory map areas for a process. Similar to the file descriptors, this can also cause errors if corresponding settings are not tweaked.

Interested in Containers? Check out our Kubernetes as a Service

Just like the file descriptors above, the operating systems also imposes a limit on the number of memory map areas each process can access or have. The setting for this is the parameter "vm.max_map_count" variable in the "/etc/sysctl.conf" file. The recommended value for the "vm.max_map_count" parameter is 262,144. Any value below this might cause undesired errors and performance degradation.

Heap Memory

Heap memory management is an important area for successful deployment of Elasticsearch. The two major considerations given for this are:

  1. Heap memory should not be more than 50% of the total available RAM.
  2. The maximum memory that can be allocated for heap is 32GB

On many occasions, such as the indexing of very large number of files, or when to deal with very large number of requests, elasticsearch gets overloaded to cause many performance related issues. The root cause of these performance related issues will lead us to the heap memory allocation and the way it is done in elasticsearch.

In order to have a good picture about heap memory management, you need to be familiar with how the garbage collection mechanism in Java, the language in which Elasticsearch, is written. Java objects resides an area called the heap memory. When JVM starts, it creates the heap space and manages the size of the heap by increasing or decreasing it as the application runs. The Java Virtual Machine employs an automated process called Garbage collection for the allocation and deallocation of objects in the heap.

When we have shorter heap memory, the cleansing takes only a small amount time by the garbage collection process, and also the delay in cleaning is negligible. This is not the case when the load increases. As the size of the heap increases, the garbage collection process takes significantly longer to clean up the residing objects in the memory. 

When this cleansing is happening, if there is other data entering the heap at significantly higher rate than the cleanup process, the chances of heap memory overflow are high. When such a case happens, the heap floods and issues can range from being slow to process failure.

The maximum memory can be set using the ES_HEAP_SIZE environment variable. There are a few general guidelines in setting the heap size for Elasticsearch:

  1. It should not be more than 50% of the total available RAM. Since the filesystem caches are extensively used by Lucene, sufficient memory unavailability hinders the performance.
  2. The maximum memory that can be allocated for heap is 32GB. If heap size exceeds 32GB, pointers occupy double the space, and less memory will be available for operations, which finally results in performance degradation.   

Fine Tuning the Memory

The heap is divided into several segments. The fine tuning of these heap segments also results in better performance for heap. Allowing the young generation more space and keeping the young to old generation memory ration 1:4 is a recommended practice during Elasticsearch deployment.

To better manage on heap, we need to be a little conscious, not only about the above settings, but by some intricate changes in the elasticsearch heap memory settings. For understanding this better, let us have a look at the heap memory structure in detail:

young-old-es.png#asset:1531

There are two major divisions in the heap memory named "old" and "young generations" as shown in the picture above. The young generation is the part of the heap memory which allocates the memory for newly created objects. When this space becomes occupied after a threshold, the garbage collection process comes into play and shifts some of the early objects to the old generation space. Since most young generation objects are short lived, garbage collection on the same is relatively less frequent.

The default ratio of allocation of memory for old to young generation is 2 in JVM. For better performance it is ideal to have young generation size to be higher since there are less frequent garbage collection process happening there.

Node Memory Statistics and Tuning

We can employ the “_node/stats” API to know the exact configuration of the memory allocation in the node. By viewing it, we can make the necessary changes, like making the old generation to young generation ratio to 1:4.

We can check the ratio of young and old generation in heap memory by typing in the following command in the command line:

curl <IP:PORT>/_nodes/stats?pretty

This yields a response where we can see the sizes occupied by young and old generations and if needed can change the ratio accordingly using the “New Ratio” parameter in JVM java option.

The bigger the younger generation, the less often minor collections occur. We can set the size for the younger generation in the elasticsearch startup script provided by setting the variable eg: ES_HEAP_NEWSIZE=1g, or we could set it from the JVM opts using the “New Ratio” parameter eg: Setting the jvm opts to -XX:NewRatio=4 means that the ratio between the old and young generation is 1:4.

Garbage Collection Issues

The issues with improper garbage collection in a node can lead to several problems like the node leaving the cluster. Such an example is given below with log outputs. Let us look into a sample error log concerning with the a memory issue:

[2016-10-28 06:41:59,043][INFO ][cluster.service ] [qbox-node5] removed {{qbox-node1}{I9CzLoWAQcGA-H6woQ593Q}{10.8.4.XXX}{10.8.4.XXX:9300},}, reason: zen-disco-receive(from master [{qbox-node2}{5egKJURrT4ewmbTv9hTZZw}{10.3.212.141}{10.3.XXX.XXX:9300}

The above log says that one node has been ousted from the cluster.  One reason this happens is when a node is continuously refusing the ping requests from master. This is also called a "node leaving the cluster" issue. This might happen due to a variety of reasons of which the majority issue is related to the high values of "stop-the-world" time in Java. 

The most common cause for STW pauses are the garbage collection issues. To confirm our findings, let us have a look at the logs of node that has left from the cluster (in this case the node named "qbox-node5"). The following is the log which was found in the node:

[2016-10-28 00:09:15,506][INFO ][monitor.jvm ] [qbox-node5] [gc][young][580391][1438] duration [747ms], collections [1]/[1.2s], total [747ms]/[2.1m], memory [18.3gb]->[17.9gb]/[29.8gb], all_pools {[young] [807.7mb]->[18.5mb]/[865.3mb]}{[survivor] [108.1mb]->[108.1mb]/[108.1mb]}{[old] [17.4gb]->[17.8gb]/[28.9gb]}

In the above log, the details of the garbage collection is being given.

We can deduce that there is a total memory of 29.8gb being allocated and the young, survivor and old divisions. In the above, you can see that the young generation garbage collection was done effectively from 807.7mb to 18.5 mb and there is a significant cleaning up there. 

Whereas, for the survivor and the old generation, there was not much memory cleaning even after garbage collection process, which caused the application thread to pause. This pause is the reason for generating the no response for the master node ping and if this happens for a large interval of time the node leaves the cluster and an error message like the first one is created.

Give it a Whirl!

It's easy to spin up a standard hosted Elasticsearch cluster on any of our 47 Rackspace, Softlayer, or Amazon data centers. And you can now provision your own AWS Credits on Qbox Private Hosted Elasticsearch

Questions? Drop us a note, and we'll get you a prompt response.

Not yet enjoying the benefits of a hosted ELK-stack enterprise search on Qbox? We invite you to create an account today and discover how easy it is to manage and scale your Elasticsearch environment in our cloud hosting service.

comments powered by Disqus