Qbox Joins Instaclustr, Plans March 31, 2023, Sunset Date. Read about it in our blog post here.

Note: Qbox Inc. has deprecated the architecture used in this article. Supergiant has replaced this deprecated architecture and isn’t relevant for this article’s specifications.

Load balancing an Elasticsearch cluster is a good practice because a primary goal in ES configuration is a fault-tolerant system that’s resilient to single node failures. All of us here at Qbox know that the load-balancing features of Elasticsearch are a big part of what makes it such a great distributed computing platform. We also know that you’ll get the best performance when you take a bit more time to properly size your Elasticsearch cluster and optimize your computing resources to correspond well with your expected load.

In this article, we bring to mind some considerations that can help you optimize your cluster by improving distribution, performance, and reliability. We’re also aware that various notions about “load balancing” are heard throughout the community, so we also explore various meanings and applications of load balancing with respect to ES clusters.

It’s good practice to load balance your Elasticsearch clusters to achieve strong resilency against single node failures. But before you configure your cluster for load-balancing, you need background on the two primary functions of Elasticsearch, writing/indexing documents and querying documents.

Writing / Indexing Documents in Elasticsearch

When a new document comes into Elasticsearch for indexing, Elasticsearch passes the document to a primary shard assignment using a routing algorithm. A shard is actually a Lucene instance underneath. Elasticsearch passes the document to this Lucene instance for search and storage.

That same process adds the document to the inverted index for that shard. The replica shard(s) receive the document, map the document, and add the document to the inverted index for those replicas.

Querying Documents in Elasticsearch

By default, when a query is sent to Elasticsearch, it goes to a particular node, which becomes the gateway query node for that query. That node broadcasts the query to every primary shard and any replicas throughout the index.

Each shard also executes the query on its local inverted index and, by default, returns the top 10 results to the gateway query node. The gateway query node then performs a merge-sort on the combined results from the other shards.

After the merge-sort is complete, the gateway query node returns results to the client. Note that a merge-sort is both CPU- and memory-intensive.

Load Balancing for Data Writes

Elasticsearch self-manages the location of all shards across all nodes. The master node maintains the shard routing table, and it also distributes a copy of the shard routing table to other nodes within the cluster. Since it performs these management functions, we recommend that you don’t task the master node with much more than cluster health checks, updating routing tables, and managing shards. It’s best to configure one node as a dedicated master and the remaining nodes as dedicated data nodes.

We also recommend configuring the cluster to push any data writes to the data nodes. Data nodes are nodes that contain shards (that contain the data). With the master node managing the cluster, the data nodes use their shard routing tables to move each write to the appropriate shard.

Here’s an important consideration: For Java clients only, you can start the client as node client. A node client is aware of the cluster and maintains a copy of cluster state (the routing table). This means that the client can communicate directly with the nodes in which the shard is present to avoid unnecessary “double hops” on indexing. So, for a Java client, you can configure it as a node client and easily implement an inexpensive, highly updated load balancer.

Configuring for Queries

Elasticsearch provides for a special node type known as a client node, which contains no data and cannot become a master node. The function of the client node is to perform the final, intensive merge-sort at the end of the query. We consider it a best practice to configure the load balancer to direct queries to client nodes.

Qbox Clusters

Qbox clusters consist of 1 or more nodes, with each node running on an isolated VM (or server) in your chosen data center. Each node gets a unique public IP address and a private IP address—if available. Hostnames resolve to the public IP and are also assigned to each of the nodes using a shared prefix, like so:

  • 522c0fae……..000.qbox.io (node 1)
  • 522c0fae……..001.qbox.io (node 2)
  • 522c0fae……..002.qbox.io (node 3)
  • 522c0fae……..019.qbox.io (node 20)

You can use any endpoint to communicate with the cluster, which will return responses for the cluster as a whole. For example, if indexing requests are sent to node 2, it will not necessarily be the case that node 2 will be the destination for the data. Elasticsearch will route the data to an appropriate node. Similarly, search requests that are sent specifically to node 4 will return results for the entire cluster—not just to the matching data on node 4.

We realize that it’s possible to use a load balancer to distribute requests to endpoints, but we don’t recommend it. Even if the load balancer is on the local (data center) network, a remote load balancer is another point of network indirection for Elasticsearch requests.

Most Elasticsearch clients will configure for client-side load balancing and accept an array of hosts when initializing (that is, listing the endpoint for each node in your application code). An array of hosts is generally much more efficient than a hosted load balancer because it prevents additional network transit time on each request.

For some, the need for a remote load balancer may be the ability to detect downed nodes and attempt retries. Qbox implements a similar solution internally, and you can read more about failover with multiple nodes.

Note: Many hosted datastore solutions will employ a “shared” architecture in which users share computational resources that are found in larger host machines. This was the original Elasticsearch design: abstracting nodes from the user and providing a single endpoint. Many services still use this approach. For many reasons and with extensive experience, we at Qbox have learned that a single-tenant approach is far more appropriate for Elasticsearch.

Since most cloud infrastructure is built on hypervisors, some may astutely note that we’re technically maintaining a “shared” architecture. While this may be true, it is still true that many hosted datastore solutions will add another layer of (potentially performance-impeding) virtualization (i.e., OS containers or shared processes).

Like this article? You’ll love our hosted solution with free support. You can sign up or launch your cluster here, or click “Get Started” in the header navigation. If you need help setting up, refer to “Provisioning a Qbox Elasticsearch Cluster.