Auto-scaling has been a frequent-request feature since the inception of the Qbox service because auto-scaling with Elasticsearch isn't as easy as is commonly thought.
Horizontal scaling up is trivial, of course, and is one of the primary benefits of this technology. Automatic scaling down is typically more troublesome and-if not done carefully-rebalancing/reindexing carry an intrinsic computational overhead that dramatically affects performance. Meanwhile, our nodes-by-the-compute-hour model makes vertical scaling a potentially expensive prospect.
Taking all of this into consideration, today we announce a vertical resizing feature addition that is available on your dashboard. Now you can easily resize vertically by migrating to bigger VMs that have more resources. This gives you more flexibility beyond the horizontal scaling that is already available (adding nodes to an existing cluster).
Since it isn't feasible to simply resize a typical cloud server, it's necessary to replace it. In comparison with other alternatives, Qbox gives you the speed and flexibility to perform quick migrations to a larger cluster. With a maximum of 90 seconds of downtime, we help you minimize the impact to both your operations and your users. Read on to see how easy it can be with Qbox hosted Elasticsearch services.
The most common reason for migrating a Qbox cluster is to change the server node size and reconfigure the computational resources dedicated to your cluster. In addition, migrations are also useful for changing regions, replacing deprecated hardware, or similar operations in which minimal downtime is the highest priority. In this article, we explain the importance and necessity of size migrations and how to perform such migrations on Qbox. If you already have a Qbox account, you can skip on down to the migration instructions toward the end of the article.
If you're wondering why you would want to increase cluster node size instead of simply adding nodes, please read our Choosing a node size article in the Qbox Help Center.
How Resizing Migrations Work
Since it isn't feasible to simply resize a typical cloud server, you've got to replace it. To maximize uptime, Qbox migrations will link the nodes on the existing cluster to other nodes that have a different size. This approach is efficient for transferring data (shards) from the source nodes to the destination nodes.
During the data transfer phase of the migration, the original nodes will continue to manage requests from dependent applications, while simultaneously transferring all of the original shards to the new nodes. As you might expect, the duration of the data transfer depends on the size of the dataset.
Consider a migration in which a 3-node cluster is to retain the same number of nodes. In such cases, a 3-node cluster will temporarily require 6 nodes during the migration process. Following completion of the data transfer -- but before we unlink and destroy the original nodes -- we must redirect the flow of user requests to the new nodes as seamlessly as possible. This can be done by either of two methods:
- A DNS update for users connecting to the hostnames in the original nodes.
- An endpoint change for users connecting to the private IP addresses in the original nodes.
NOTE: Each node in a Qbox cluster is a server with both public and private IP addresses. Publicly-addressable hostnames, such as xxxxxxxxxxxxxxxxnnn.qbox.io, are assigned to the public IP for each node using a DNS record.
When the data transfer is done, Qbox will update the DNS record for the hostname of each node irrespective of whether hostnames or private IPs are in use. After this DNS update is complete, the migration process will wait for 24 hours (DNS propagation is often unpredictable). However, the process can resume faster if the user confirms they've either:
- Updated the /etc/hosts file on their application servers to force hostnames to resolve to the new nodes, or temporarily pointed to the public IPs of the new nodes.
- Updated their application code to point to the private IPs of the new nodes.
Estimate for Total Runtime
The total maximum runtime of any migration is the time necessary to accomplish the data transfer (which is variable) PLUS the maximum DNS propagation wait time of 24 hours (which is not required).
It's quite possible to complete a small migration that skips the DNS wait period within an hour.
The primary benefit of the Qbox migration process is minimal downtime although a small amount of downtime is currently unavoidable.
Clusters with 3 or more nodes will experience around 10-20 seconds of downtime. Clusters having fewer than 3 nodes may experience up to 90 seconds of downtime. This cumulative downtime is due to a series of process restarts that are necessary at some points during the migration. Read our Qbox Help Center: High availability / failover article to learn more about failover and rolling restarts.
How to Perform a Resizing Migration
Follow these simple steps to perform a resizing migration on your Qbox cluster:
- Log in to your Qbox account, navigate to your Clusters page and then choose the cluster that you want to resize.
- In the drop-down menu, choose Manage > Migrate/Change Node Size, as shown in the figure below.
- You will see a listing of all nodes on this cluster, as shown in the next figure.
- To choose the new sizing for this cluster, select one of the option buttons along the left size of the Hardware panel.
- After confirming that all information is correct, click the Begin Migration button.
- After a moment, you'll see a migration status page. As the migration proceeds, you can watch the Status indicator bar to get a rough estimate of the relative progress.
- As the migration nears completion, you'll see a warning prompt similar to the figure below. After ensuring that you are ready to destroy the original cluster, click the Yes button to finalize the migration.
- The finalizing process will unlink and destroy the old nodes, and the Clusters page will display the new sizing configuration.
Easy, Low-impact Resizing of your Clusters
That's it! You can see how easy it is to migrate to a larger new cluster on Qbox. With a maximum of 90 seconds of downtime, we help you minimize the impact to both your operations and your users.