This post continues our series on Getting to Know Elasticsearch. Technical writer John Vanderzyden, a technologist for over two decades, is writing a series of guest posts documenting how easy it is to ramp up as a newbie to Qbox. He's especially keen to help individuals who are new to these technologies. Here John elaborates on topics that we introduced in the second post in the series (Getting to Know Elasticsearch, Part 2: Scripting). -- Mark Brandon

This article gives an overview of user access and security configuration for your Elasticsearch cluster. It is essential to remember that Elasticsearch does not perform any authentication or authorization and that this is the responsibility of the developer or administrator. Qbox is a premier Elasticsearch provider, and one of our top priorities is to protect you--our customers--along with your users, from those who have harmful intent, as well as to help your users avoid unnecessary problems.


It’s critically important that you carefully monitor and examine all Elasticsearch requests, very much like you would a database management system. Unlike most databases, however, Elasticsearch does permit arbitrary code execution. That presents some challenges that we address in this article. We look at the various trust levels that you can assign to your users along with risks that correspond to each level. We point you in the right direction so that you can manage the full range of security hazards, from arbitrary requests with full access to simple, well-defined parameter requests.">

Your Responsibilities

If you think about it, the design of Elasticsearch doesn’t actually include the concept of a user. Anyone who has the ability to send arbitrary requests to your cluster is the equivalent of a super-user. You may be familiar with DB systems like PostgreSQL, in which you limit access to databases tables and functions with high specificity. Elasticsearch does not offer these features, since there are many options for implementing various authentication and authorization schemes with tight coupling to the application domain.

The suggestions in this article also apply to other search engines, so we are not implying that these are inherently insecure. While we respect the design choice to leave security to the user, we want to raise awareness about key concerns in managing Elasticsearch security.

Goals

For basic security in your Elasticsearch environment, Qbox recommends that you:

  1. Prevent usage of the script feature for execution of arbitrary code. If we are unable to do so, anything else is merely “security through obscurity,” and unnecessarily exposes your sensitive data. However, QBox security features give you the flexibility of allowing dynamic scripts.
  2. Limit access to both searching and indexing. To some extent, you can achieve this with a proxy layer.
  3. Prevent denial of service requests that can overwhelm your cluster.

As you read below, we’ll help you consider how to achieve these goals even when running Elasticsearch in a local development environment.

Prevent Free-form Scripting

Elasticsearch has very powerful scripting facilities. You have to assume that end users who need to run scripts will need have shell access to your host. This is an important consideration in many contexts such as updates, facets, filters, and scoring. But these scripts are not run in a sandbox, so there is nothing preventing a script from sending a second request back to Elasticsearch and thereby evading any URL-based access control.

Elasticsearch disables dynamic scripting by default (since version 1.2.0). However, if you are running an on-premise instance, QBox enables dynamic scripting because we provide a number of solid security features to prevent and block unauthorized access. As a Qbox user, you can disable dynamic scripting, but it will then be necessary to manually upload script files to the cluster if a particular user needs to use scripting.

To ensure robust security, Elasticsearch does not allow you to specify scripts by means of a request. Instead, you’ll need to place scripts in the scripts directory within the configuration directory (the same directory containing elasticsearch.yml). Elasticsearch will automatically recognize any scripts in this directory and put them into service.

Limiting Indexes and Operations

Elasticsearch gives you many options for specifying indexes to search. If you have different users on the same cluster and let them send arbitrary search requests, then you may also want to restrict their access to indexes.

Typically, the user will specify indexes in the request URL, such as ...index_pattern/type_pattern/_search. However, there are also APIs such as multi-search, multi-get, and bulk that can take index as a parameter in the request body, which will override what indexes are searchable or what documents receive indexing. The option allow_explicit_index will block all such overrides.

Remember: the index is actually an index pattern and not necessarily an index name. So, if you prefix an index with something user specific, then you must consider index patterns as well. For example, setting index_name = "user123_" + user_specified_index would not work very well if the value of user_specified_index = ",*". The request would end up as a request to user123_,*/_search, and the search would run on every index.

If disable_dynamic_scripts = true and allow_explicit_index = false, then you can be certain that requests sent to _search- and _msearch/_mget endpoints can only affect the indexes that you explicitly allow. In this scenario, it’s possible to configure the proxy layer to limit what indexes the forwarded requests can affect.

You can use a filtered alias to restrict which documents are available. Any search, count, more-like-this and delete-by-query requests will have the filter applied. If you rely on these, we recommend that you ensure the underlying indexes are inaccessible.

After placing restrictions on the indexes and endpoints to which your users can send requests, you must also consider what methods to allow. You’ll probably want to prohibit the ability to DELETE an index. Since it’s a good practice to permit idempotent requests, it might be a good idea to disallow requests directly to the index anyway, and only allow POST-ing to endpoints like _search or _bulk or PUT-ing and DELETE-ing specific documents.

Preventing Denial of Service

While not as harmful as data exposure, you’ll want to be vigilant in avoiding requests that can crash your cluster or severely degrade its performance. Unfortunately, avoiding them is not as easy as flipping a configuration variable.

Total memory consumption often has disastrous impact on any server cluster, and it should never happen on a production cluster. We recommend reading the article on Elasticsearch in production-especially the section on OutOfMemory-caused crashes. There are many things that can consume a lot of memory in Elasticsearch. These are a few examples:

  • Field caches for fields to facet, sort, and script on
  • Filter caches
  • Segments pending flushing
  • Index metadata

Loading a field that has grown too large is probably the most common cause of memory exhaustion. Two important improvements are available in Elasticsearch version 1.0:

  • Document values. By enabling these in your mapping, Elasticsearch will write the document values so that the operating system page cache will efficiently handle these values. This will significantly reduce the amount of memory necessary for the heap space, although this approach may be a bit slower.
  • Circuit breaker. The purpose is to impose a limit on how much memory is available for loading a field. It is disabled by default. But a sensible limit will cause requests attempting too much load will break with a CircuitBreakingException-which is much safer than an OutOfMemory error.

Staying Safe while Developing with Elasticsearch

Typically, Elasticsearch is used through HTTP by binding to localhost. Commonly, we think that external hosts cannot connect to something behind a firewall or that is listening to localhost. But think about it: your web browser can reach your localhost, and it might be able to reach servers on your company internal network. So, any website that you visit can send requests to your local Elasticsearch node. Your browser will happily do an HTTP-request to 127.0.0.1/_search. Consequently, any website can go mining in whatever data is in your local-running Elasticsearch, and then POST its findings somewhere else. Adjustments to the settings for cross-origin resource sharing does help, but it’s still possible to search using JSONP-requests.

We recommend running Elasticsearch in a virtual machine when you develop on a machine that you also use to surf the web. Also, don’t keep any sensitive data locally, and disable any dynamic scripts.

Recommended Solutions

Restricting access to indexes and adding authentication and SSL can be done with numerous tools, though the implementation of these is outside the scope of this article. Nginx is quite popular for these purposes. Additionally, there are various Elasticsearch plugins that attempt to add things such as basic authorization. You can configure ACLs that implement SSL, HTTP basic auth, and restrict the number of available methods and paths.

Even though Elasticsearch is multi-tenant and can readily serve many different users and applications on the same cluster, at some point you might want to create multiple clusters for partitioning resources and providing segmentation for additional security. We recommend this distribution to reduce the impact of a problem in a specific area. For example, if you get a sudden increase in traffic, the corresponding increase in logging throughput (probably done with Logstash and Kibana, right?) shouldn't impact the performance of the more critical applications.

Cool stuff is continuously trickling into Elasticsearch, which is being developed at a mind-blowing pace. Some of these improvements will make life easier when dealing with a few of the challenges mentioned here. Others may introduce new challenges.

You should always assume that security is left to you. Remember, security is an onion, and good strategies have multiple layers. Don’t let Elasticsearch be a weak layer in your application’s security onion!