Note: Qbox hosted Elasticsearch automatically creates backups for your clusters. If you are interested in a hosted solution with top-notch (free!) 24/7 support, sign up and spin up a cluster in 5 minutes here: https://qbox.io/signup.

As your cluster and your indices grow, you of course feel the increasing need to retain the data that you have accumulated. Many of us have experienced the complete panic that comes when you realize that you cannot actually restore your backup — a painful lesson that backups are worth nothing if you do not test and confirm that they can actually restore.

The snapshot / restore module allows you to create snapshots of your Elasticsearch indices — or a snapshot of the cluster as a whole — which can then be stored in a remote repository.

There are different types of supported repositories. If you have a shared file system — for example, an NFS filesystem — that is accessible by all nodes at the same mounting point, then you can use that for storing your indices or entire cluster snapshot.

Elasticsearch was designed to be run in different environments, and it works very well in a cloud environment. The snapshot / restore module also supports various cloud repositories such as:

  • AWS (You can store the backups on S3)
  • HPFS for Hadoop
  • Azure Cloud
  • Google Cloud Storage (GCS) repositories

If you don’t have access to the cloud, you can make use of an NFS share to back up on a multi-node cluster. Just make
the share accessible on the same mounting point on all nodes. To create a snapshot, you will first need to register a repository to which you’ll create the snapshot, and Elasticsearch should be able to write to this location.

If you are on a Windows network, then you could back up using Microsoft NFS file shares. The snapshot / restore documentation mentions that you can use Microsoft UNC paths in your config. You could even back up to a normal directory if you are on a single node cluster, and I’ll show you how to do this shortly.

To summarize, if you are not in an cloud environment, your options for creating a repository are:

  • Create repository on Windows shares using Microsoft UNC path
  • Create repository using NFS on Linux
  • Create repository using directory on single-node cluster

You can check if you have any repositories already set up with:

$ curl -XGET 'localhost:9200/_snapshot/_all?pretty=true'
{ }

The blank response that we got indicates that we don’t have any repositories set up yet.

We can create a repository on our single-node cluster to which we can back up and then copy our backup files to another server and try restoring it. First, we create a directory to which we will save our snapshot.

$ mkdir ~/elasticsearch-backup/

Now, we need to give Elasticsearch permission to write to this directory.

$ sudo chown -R elasticsearch:elasticsearch 
/Users/kir/elasticsearch-backup/

Next, we need to specify the path.repo in the Elasticsearch configuration file:

$ sudo vim /etc/elasticsearch/elasticsearch.yml  
# Require explicit names when deleting indices: # 
# action.destructive_requires_name: true 
# *********************************************** 
# Shared repo 
path.repo: ["/Users/kir/elasticsearch-backup"]

Now, we can restart the Elasticsearch service so we can create the repository that we are going to use to store our snapshots.

$ sudo service elasticsearch restart

We can create our repository via Elasticsearch’s REST API. There are several options that we can add when we create the repository, but for now I am only going to enable compression. You can look over some of the configuration options in your own time, and note that the chunk_size configuration option might be handy if you are compressing very large indices.

We create our repository in Kibana with:

PUT /_snapshot/my_backup 
   {
     "type": "fs",
      "settings": {
        "compress" : true,
        "location": "/Users/kir/elasticsearch-backup" 
      }
}

Elasticsearch snapshots

We can check if the repository was successfully created by listing all the repositories with:

GET /_snapshot/_all

Elasticsearch snapshot repository

Let’s make use of the snapshot that we’ve just created. The default behavior of the snapshot request is taking the snapshot of all indices including Kibana indexes and indexes of other supporting applications like APM. However, Elasticsearch allows us to take snapshots of individual indexes as well using multi-index syntax. This is how we do it:

PUT /_snapshot/my_backup/snapshot_07_07_2020?wait_for_completion=true&pretty
{
  "indices": "sports,orders,twitter",
  "ignore_unavailable": true,
  "include_global_state": false,
  "metadata": {
    "taken_by": "kirill",
    "taken_because": "backup before upgrading"
  }
}

Note the “indices’ parameter. It’s where we list all indices we want to snapshot. You can also include snapshot metadata and other useful parameters described here. Upon your success, you’ll see the following response in Kibana.

Elasticsearch snapshot API

It is interesting to see what is actually created in the repo directory after you take a snapshot:

$ cd ~/elasticsearch-backup
$ ls -alt
total 32
drwxrwxr-x 3 elasticsearch elasticsearch 4096 Sep 11 21:21 .
-rw-r--r-- 1 elasticsearch elasticsearch 49 Sep 11 21:21 index
-rw-r--r-- 1 elasticsearch elasticsearch 200 Sep 11 21:21 snap-snapshot2.dat
-rw-r--r-- 1 elasticsearch elasticsearch 446 Sep 11 21:21 meta-snapshot2.dat
-rw-r--r-- 1 elasticsearch elasticsearch 208 Sep 11 21:11 snap-snapshot-number-one.dat drwxr-xr-x 9 timo timo 4096 Sep 11 21:10 ..
drwxr-xr-x 4 elasticsearch elasticsearch 4096 Sep 11 21:10 indices
-rw-r--r-- 1 elasticsearch elasticsearch 446 Sep 11 21:10 meta-snapshot-number-one.dat

Or we can check the file type:

$ file *
index: ASCII text, with no line terminators 
indices: directory
meta-snapshot2.dat: data 
meta-snapshot-number-one.dat: data 
snap-snapshot2.dat: data 
snap-snapshot-number-one.dat: data

Snapshots are stored in repositories. You can have multiple repositories and thus save your snapshots to any of your repositories.

To delete the snapshot, you will have to delete it from the repository. If a snapshot is stored in more than one repository, then you will have to delete it from each repository in order to delete it fully.

Remember that a repository is just a storage location that you register with Elasticsearch. You can delete a repository, which in effect is de-registering a storage location as registered in Elasticsearch. The actual storage location with snapshots will remain untouched. This is something important to remember when you are trying to free up space on your server.

Restoring from a Snapshot

Now, let’s go over the restore process, step by step. We are going to restore data from our first single-node Elasticsearch cluster (say, cluster1) to another single-node Elasticsearch cluster (say, cluster2). 

On cluster1 we need to create a tar.gz file of the directory that is used as the location for the repository to store its data. We can then decompress this directory on cluster2 and register it as a repository.

First, let’s compress the directory on cluster1:

$ tar -zcvf elasticsearch-backup.tar.gz ~/elasticsearch-backup/

Now we can scp the archive to cluster2 if it’s remote:

$ scp elasticsearch-backup.tar.gz kir@172.20.0.182:/Users/kir/

We are now logged in to cluster2, and we can decompress the backup directory with:

$ tar -zxvf elasticsearch-backup.tar.gz

Move the directory to the same place where it was located on cluster1.

$ mv Users/kir/elasticsearch-backup/ ~/elasticsearch-backup/

Now it is time to start Elasticsearch and Kibana on cluster2 so we can go ahead and register our repository. Then we can restore our snapshot from the repository.

$ service elasticsearch start
$ service kibana start

We are repeating many of the steps that we did on cluster1 on cluster2 manually for the sake of explanation. Feel free to automate these processes.

Open your Elasticsearch config on cluster2 and add the path.repo to the file:

# --------------------------Various ----------------------------------- #
# Disable starting multiple nodes on a single system: #
# node.max_local_storage_nodes: 1 #
# Require explicit names when deleting indices: #
# action.destructive_requires_name: true 
path.repo: ["/Users/kir/elasticsearch-backup"]

Now restart the Elasticsearch service:

$ sudo service elasticsearch restart

Next we need to make sure that the Elasticsearch user has needed permissions to access the directory:

$ sudo chown -R elasticsearch:elasticsearch /Users/kir/elasticsearch-backup/

The next step is to register it as a repository. To clarify, we are on cluster2 where we’ve moved our backup of our snapshot from cluster1. The registration process is identical to what we made on cluster1.

Elasticsearch snapshot restore

Now everything should be set up so we can begin to restore our snapshot from the repository. We can list the snapshots in the repository with:

GET /_snapshot/my_backup/_all
 
Elasticsearch snapshot restore

Let’s now restore one of the repositories that we just listed:

POST /_snapshot/my_backup/snapshot_07_07_2020/_restore

We can monitor the progress of restoring a snapshot in Kibana with:

GET /_snapshot/my_backup/snapshot_07_07_2020

Restoring snapshot in Elasticsearch

Great! As the response in Kibana suggests, our restore process was successful. You can check that the snapshot indexes are now on the cluster2 by typing:

GET _cat/indices?pretty

Restoring snapshots in Elasticsearch

This was the long method to snapshot / restore. (Snapshot is often used as a verb in Elasticsearch’s documentation to refer to the process of creating a snapshot.)

Conclusion

Creating snapshots of your data is not something to take lightly, and you need to feel confident that you are doing it the correct way, so make sure to keep this post for reference and review the Elasticsearch documentation. Always remember to test your backups, even if you just move them to dummy test clusters. A backup that can’t restore is not a backup at all. It’s a false feeling of security — completely useless (and it can be a nightmare).

Qbox automatically creates snapshots for you each night, but if you haven’t switched to Qbox and you work in a cloud environment, we hope that this post has been helpful and that you can now effectively and confidently make use of snapshot / restore on your own!

Remember:

Qbox hosted Elasticsearch automatically creates backups for your clusters. If you are interested in a hosted solution with top-notch (free!) 24/7 support, sign up and spin up a cluster in 5 minutes here: https://qbox.io/signup.

Questions/Comments? Drop us a line below.