How many nodes should the cluster have? It's a difficult question. Ultimately, it will boil down to questions like the following: 

  1. How much data are you working with?

  2. How many searches will you be processing?

  3. How complex are your searches?

  4. How much resources will each node have to work with?

  5. How many indexes/applications will you be working with?

The answer to that question depends on a lot of factors, like expected load, data size, hardware, etc. In this tutorial post we discuss how to avoid the split brain problem.

For this post, we will be using hosted Elasticsearch on Qbox.io. You can sign up or launch your cluster here, or click "Get Started" in the header navigation. If you need help setting up, refer to "Provisioning a Qbox Elasticsearch Cluster."

Our Goal

The goal of the tutorial is to demonstrate split brain problem in Elasticsearch. Qbox provides a turnkey solution for Elasticsearch, Kibana, and many Elasticsearch analysis and monitoring plugins. One must be very careful while assigning number of replica nodes to Qbox Elasticsearch cluster configuration. Although Qbox provides the functionality of upgrading or changing the cluster configuration anytime, configuring the cluster the right way at initialization helps us to avoid pitfalls.

What is Split Brain?

The problem comes in when a node falls down or there's simply a lapse in communication between nodes for some reason. If one of the slave nodes cannot communicate with the master node, it initiates the election of a new master node from those it's still connected with. That new master node then will take over the duties of the previous master node. If the older master node rejoins the cluster or communication is restored, the new master node will demote it to a slave so there's no conflict. For the most part, this process is seamless and "just works."

However, consider a scenario where you have just two nodes: one master and one slave. If communication between the two is disrupted, the slave will be promoted to a master, but once communication is restored, you end up with two master nodes. The original master node thinks the slave dropped and should rejoin as a slave, while the new master thinks the original master dropped and should rejoin as a slave. Your cluster, therefore, is said to have a split brain.

Let’s consider a simple Elasticsearch cluster with two nodes. 

d1.png

The cluster holds a single index with one shard and one replica. Node A was elected as master at cluster startup and holds the primary shard (marked as 0P in the schema below), while Node B holds the replica shard (0R).

d2.png

What if, for any reason, communication between the two nodes would fail due to network failures or simply because one of the nodes becomes unresponsive (as in a case of a stop-the-world garbage collection)?

d3.png

Both nodes believe that the other one has failed. Node A will do nothing because it’s already elected as master. But Node B will automatically elect itself as master because it believes it’s part of a cluster that no longer has master.

d4.png

In an Elasticsearch cluster, it’s the responsibility of the master node to allocate the shards equally among the nodes. Node B holds a replica shard, but it believes that the primary shard is no longer available, and it will therefore automatically promote the replica shard to a primary.

d5.png
 

Our cluster is now in an inconsistent state or the red state. In this situation the two copies of the shard have diverged, and it would be really difficult to realign them without a full reindexing. Indexing requests that will hit Node A will index data in its copy of the primary shard, while the requests that go to Node B will fill the second copy of the shard. 

d6.png

Even worse, for a non-cluster aware indexing client (e.g. REST interface) this cluster state will be totally transparent because indexing requests will be successfully completed every time, regardless of which node is hit. The split brain problem would only be slightly noticeable when searching for data: depending on the node the search request hits, results will differ. 

Election of Master Node

The discovery.zen.minimum_master_nodes control the minimum number of eligible master nodes that a node should “see” in order to operate within the cluster. It’s recommended that we set it to a higher value than 1 when running more than 2 nodes in the cluster.  One way to calculate value for this will be N/2 + 1 where N is number of master nodes. This setting must be set to a quorum of our master eligible nodes. We recommend having only two master eligible nodes, since a quorum of two is two. Therefore, a loss of either master eligible node will result in an inoperable cluster.

It’s not intuitive to set the minimum_master_nodes parameter to 2 in the case of a two node cluster, but in this case if a node fails, the whole cluster fails. Although this removes the possibility of the split brain occurring, it also negates the built-in high-availability mechanism through the usage of replica shards.

Tutorial: Searching and Fetching Large Datasets in Elasticsearch Efficiently

If you’re just starting out with Elasticsearch, the recommendation is to plan for a 3-node cluster and set the minimum_master_nodes to 2, limiting the chance of the split brain problem, but still keeping the high availability advantage. We can afford to lose a configured replica node but still keep the cluster up-and-running.

A 2-node cluster provides the possibility of either choosing to live with the possibility of the split brain while keeping the high availability, or choosing to avoid the split brain but lose the high availability. If you really want to only use 2 nodes, you can still prevent split brain by using another Elasticsearch config setting, node.data. By simply setting this to false on one of the two nodes in your cluster, you effectively prevent that node from ever being a master node. However, by doing that, you also lose any failover ability because if the master node goes down, the entire cluster is down.

The best option however in this case is to add a node to the cluster and avoid compromise. This sounds very drastic, but it doesn’t have to be. For each Elasticsearch node you can chose if that node will hold data or not by setting the node.data parameter. The default value is “true”, meaning that by default every Elasticsearch node will also be a data node.

A third node with node.data parameter set to “false” will never hold any shards, but it can be elected as master. This third node can also be initialized on a quite less expensive hardware as it will be a data-less node. The minimum_master_nodes can be finally set to 2 because we have a scaled to a 3-node cluster from 2-node cluster, thus, avoiding the split brain and still afford to lose a node without losing data.

Configuring Zen Discovery

It is vital to configure the discovery.zen.minimum_master_nodes setting (which defaults to 1) so that each master-eligible node knows the minimum number of master-eligible nodes that must be visible in order to form a cluster.

Lets consider that you have a cluster consisting of two master-eligible nodes. A network failure breaks communication between these two nodes. Each node sees one master-eligible node itself. With minimum_master_nodes set to the default of 1, this is sufficient to form a cluster. Each node elects itself as the new master (thinking that the other master-eligible node has died) and the result is two clusters, or a split brain. These two nodes will never rejoin until one node is restarted. Any data that has been written to the restarted node will be lost.

Now imagine that you have a cluster with three master-eligible nodes, and minimum_master_nodes set to 2. If a network split separates one node from the other two nodes, the side with one node cannot see enough master-eligible nodes and will realise that it cannot elect itself as master. The side with two nodes will elect a new master and continue functioning correctly. As soon as the network split is resolved, the single node will rejoin the cluster and start serving requests again.

This setting should be set to a quorum of master-eligible nodes:

(master_eligible_nodes / 2) + 1

In other words, if there are 3 master-eligible nodes, then minimum master nodes should be set to (3 / 2) + 1 or 2: discovery.zen.minimum_master_nodes: 2 // defaults to 1 

This setting can also be changed dynamically on a live cluster with the cluster update settings API

curl -XPUT 'ES_HOST:ES_PORT/_cluster/settings?pretty' -H 'Content-Type: application/json' -d '{
 "transient": {
   "discovery.zen.minimum_master_nodes": 2
 }
}'

An advantage of splitting the master and data roles between dedicated nodes is that you can have just 3 master-eligible nodes and set minimum_master_nodes to 2. You never have to change this setting, no matter how many dedicated data nodes you add to the cluster.

Minimum Master Nodes

The minimum_master_nodes setting is extremely important to the stability of your cluster. This setting helps prevent split brains. 

When you have a split brain, your cluster is at danger of losing data. Because the master is considered the supreme ruler of the cluster, it decides when new indices can be created, how shards are moved, and so forth. If you have 2  masters, data integrity becomes perilous because you have 2 nodes that think they are in charge.

This setting tells Elasticsearch to not elect a master unless there are enough master-eligible nodes available. Only then will an election take place.

This setting should always be configured to a quorum (majority) of your master-eligible nodes. A quorum is (number of master-eligible nodes / 2) + 1. Here are some examples:

  • If you have 10 regular nodes (can hold data, can become master), a quorum is 6.

  • If you have 3 dedicated master nodes and a hundred data nodes, the quorum is 2, since you need to count only nodes that are master-eligible.

  • If you have 2 regular nodes, you are in a conundrum. A quorum would be 2, but this means loss of a node will make your cluster inoperable. A setting of 1 will allow your cluster to function, but that doesn’t protect against split brain. It is best to have a minimum of 3 nodes in situations like this.

This setting can be configured in your elasticsearch.yml file:

discovery.zen.minimum_master_nodes: 2

But because Elasticsearch clusters are dynamic, you could easily add or remove nodes that will change the quorum. It would be extremely irritating if you had to push new configurations to each node and restart your whole cluster just to change the setting.

For this reason, minimum_master_nodes (and other settings) can be configured via a dynamic API call as stated previously. It can be updated while the cluster is online:

curl -XGET 'ES_HOST:ES_PORT/_cluster/settings’ -d ‘{
    "persistent" : {
        "discovery.zen.minimum_master_nodes" : 2
    }
}’

This will become a persistent setting that takes precedence over whatever is in the static configuration. You should modify this setting whenever you add or remove master-eligible nodes.

No Active Master Node

The discovery.zen.no_master_block setting controls what operations should be rejected when there is no active master. The discovery.zen.no_master_block setting has 2 valid options:

  1. all - All operations on the node, that is, both read and writes will be rejected.

  2. write - Write operations will be rejected (default). Read operations will succeed based on the last known cluster configuration.

The basic ES discovery.zen properties can be configured as follows:

discovery.zen.fd.ping_timeout: 10s
discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.unicast.hosts: ["master_node_01″,"master_node_02″,"master_node_03″]

The above setting discovery.zen.fd.ping_timeout says that node detection should happen within 10 seconds. The unicast hosts are “master_node_01″, “master_node_02″, “master_node_03″. In addition, 2 minimum master nodes need to join a newly elected master in order for an election to complete and for the elected node to accept its mastership. We have 3 master nodes.

Give it a Whirl!

It's easy to spin up a standard hosted Elasticsearch cluster on any of our 47 Rackspace, Softlayer, or Amazon data centers. And you can now provision your own AWS Credits on Qbox Private Hosted Elasticsearch

Questions? Drop us a note, and we'll get you a prompt response.

Not yet enjoying the benefits of a hosted ELK-stack enterprise search on Qbox? We invite you to create an account today and discover how easy it is to manage and scale your Elasticsearch environment in our cloud hosting service.