Elasticsearch is a distributed, RESTful search and analytics engine capable of solving a growing number of use cases. As the heart of the Elastic Stack, it centrally stores your data so you can discover the expected and uncover the unexpected. It is a near-real-time search platform. What this means is there is a slight latency (normally one second) from the time you index a document until the time it becomes searchable.

Provisioning an Elasticsearch cluster in Qbox is easy. Please refer to Easy How-To: Provisioning an Elasticsearch Cluster on Qbox to walk through the initial steps to start and configure your cluster.

For this post, we will be using hosted Elasticsearch on Qbox.io. You can sign up or launch your cluster here, or click "Get Started" in the header navigation.

Qbox provides managed instances of Elasticsearch, the open source data exploration and analytics platform. Consider what Mongolab is for MongoDB, Cloudant is for CouchDB, or RedisLabs is for Redis: that is what Qbox is for Elasticsearch. It strives to relieve you of the devops headaches so that developers can focus on building their search apps.

Qbox is available in all data centers on AWS, Rackspace, and Softlayer. This is important because Elasticsearch is often used in conjunction with a primary data store and other apps, so it’s important to our customers to be in the same data center as their other infrastructure. It helps with performance, stability, and bandwidth costs.

Getting Started

The amount of CPU, RAM, and storage that your Elasticsearch Server will require depends on the size of your data or volume of logs that you intend to gather. 

Note: Please pick the appropriate names, versions, regions for your needs. For this example, we used Elasticsearch version 2.3.4. The most current version is 5.3. We support all versions of Elasticsearch on Qbox. (To learn more about the major differences between 2.x and 5.x, click here.)

For this tutorial, we will be using Qbox-provisioned Elasticsearch with the following minimum specs:

launch_cluster.png

Cluster Configuration and Settings

  • Provider: AWS
  • Version: 2.3.4
  • RAM: 2GB
  • CPU: 2
  • Replicas: 0

The Endpoint and Transport addresses for our Qbox-provisioned Elasticsearch cluster are as follows:1.png

Once you have an instance of Elasticsearch up and running, you can talk to it using its JSON based REST API residing at REST API Endpoint: https://5d53675f1e0dd8be3ada:3b193023f7@eb843037.qb0x.com:30024. Any HTTP client or a graphical client such as Fiddler or RESTClient can be used as you wish for convenience.

Elasticsearch is primarily used for searching, but the first step is to populate an index with some data, meaning the "Create" of CRUD, or rather, "indexing". While we are at it, we'll also look at how to update, read, and delete individual documents.

Indexing

Indexing corresponds to both "Create" and "Update" in CRUD. If we index a document with a given type and ID that does not already exists, it's inserted. If a document with the same type and ID already exists, it's overwritten.

In order to index a JSON object, we make a PUT request to the REST API to a URL made up of the index name, type name and ID i.e., QBOX_ES_Endpoint/<index>/<type>/[<id>].

Index and type are required while the ID part is optional. If we don't specify an ID, ElasticSearch will generate one for us. However, if we don't specify an ID, we should use POST instead of PUT.

So let's index something! We can put just about anything into our index as long as it can be represented as a single JSON object. In this tutorial we'll be indexing and searching for books. Here's a classic:

{
    "title": "Don Quixote",
    "author": "Miguel de Cervantes",
    "year": 1605
}

In order to index this, we decide on an index name ("novels"), a type name ("popular") and an id ("1"), and make a request following the pattern described above with the JSON object in the body. Let’s run the following using curl or Sense.

curl -XPUT "QBOX_ES_Endpoint/novels/popular/1" -d '{
    "title": "Don Quixote",
    "author": "Miguel de Cervantes",
    "year": 1605
}'

The response object contains information about the indexing operation, such as whether it was successful ("ok") and the documents ID which can be of interest if we don't specify that ourselves.

{
  "_index": "novels",
  "_type": "popular",
  "_id": "1",
  "_version": 1,
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "created": true
}

Now that we have a novel in our index, let's look at how we can update it, adding a list of languages to it. In order to do that we simply index it again using the same ID. In other words, we make the exact same indexing request as before but with an extended JSON object containing publication languages.

curl -XPUT "QBOX_ES_Endpoint/novels/popular/1" -d '{
    "title": "Don Quixot",
    "author": "Miguel de Cervantes",
    "year": 1605,
    "languages": ["English", "French"]
}'

The response from Elasticsearch is the same as before with one difference: the _version property in the result object has value two instead of one.

{
  "_index": "novels",
  "_type": "popular",
  "_id": "1",
  "_version": 2,
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "created": false
}

In order to get a document by ID, we make a GET request to the same URL as when we indexed it, but this time the ID part of the URL is mandatory. In other words, in order to retrieve a document by ID from Elasticsearch, we make a GET request to:

QBOX_ES_Endpoint/index/type/id

The result object contains similar metadata as we saw when indexing, such as index, type, and version information.

curl -XGET "QBOX_ES_ENDPOINT/novels/popular/1" -d''
{
  "_index": "novels",
  "_type": "popular",
  "_id": "1",
  "_version": 2,
  "found": true,
  "_source": {
    "title": "Don Quixot",
    "author": "Miguel de Cervantes",
    "year": 1605,
    "languages": [
      "English",
      "French"
    ]
  }
}

Deleting Documents

In order to remove a single document from the index by ID, we again use the same URL as for indexing and getting it, but this time we change the HTTP method to DELETE.

curl -XDELETE "QBOX_ES_Endpoint/novels/popular/1"

The response object contains some of the usual suspects in terms of metadata, along with a property named "_found", indicating that the document was indeed found and that the operation was successful.

{
  "found": true,
  "_index": "novels",
  "_type": "popular",
  "_id": "1",
  "_version": 3,
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  }
}

Searching

We've covered the basics of working with data in an Elasticsearch index ,and it's time to move on to more exciting things -- Searching. However, considering the last thing we did was to delete the only document we had from our index, we'll first need some sample data. Let’s index a few documents in our index.

curl -XPUT "QBOX_ES_ENDPOINT/novels/popular/1" -d '{
    "title": "Don Quixote",
    "author": "Miguel de Cervantes",
    "year": 1605,
    "languages": ["English", "French"]
}'
curl -XPUT "QBOX_ES_ENDPOINT/novels/popular/2" -d '{
    "title": "In Search of Lost Time",
    "author": "Marcel Proust",
    "year": 2004,
    "languages": ["German", "Spanish", "English"]
}'
curl -XPUT "QBOX_ES_ENDPOINT/novels/popular/3" -d '{
    "title": "Ulysses",
    "author": "James Joyce",
    "year": 1986,
    "languages": ["Latin", "Korean", "Japanese"]
}'
curl -XPUT "QBOX_ES_ENDPOINT/novels/popular/4" -d '{
    "title": "The Odyssey",
    "author": "Homer",
    "year": 1999,
    "languages": ["Chinese", "English"]
}'
curl -XPUT "QBOX_ES_ENDPOINT/novels/popular/5" -d '{
    "title": "War and Peace",
    "author": "Leo Tolstoy",
    "year": 2008,
    "languages": ["Russian", "French", "English"]
}'
curl -XPUT "QBOX_ES_ENDPOINT/novels/popular/6" -d '{
    "title": "Iliad",
    "author": "Homer",
    "year": 1999,
    "languages": ["German", "Chinese", "Ukrainian"]
}'

The _search REST API

In other words, in order to search for our movies we can make POST requests to either of the following URLs:

  • QBOX_ES_ENDPOINT/_search -- Search across all indexes and all types.

  • QBOX_ES_ENDPOINT/novels/_search -- Search across all types in the movies index.

  • QBOX_ES_ENDPOINT/novels/popular/_search -- Search explicitly for documents of type movie within the movies index.

The query DSL features a long list of different types of queries that we can use. For ordinary free text search, we'll most likely want to use one called "query string query".

Let's try a search for the word "homer" which is present in the author of two of our novels:

curl -XPOST "QBOX_ES_ENDPOINT/_search" -d '{
    "query": {
        "query_string": {
            "query": "homer"
        }
    }
}'

Let's execute the request and take a look at the result.

{
  "took": 23,
  "timed_out": false,
  "_shards": {
    "total": 38,
    "successful": 38,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.375,
    "hits": [
      {
        "_index": "novels",
        "_type": "popular",
        "_id": "4",
        "_score": 0.375,
        "_source": {
          "title": "The Odyssey",
          "author": "Homer",
          "year": 1999,
          "languages": [
            "Chinese",
            "English"
          ]
        }
      },
      {
        "_index": "novels",
        "_type": "popular",
        "_id": "6",
        "_score": 0.375,
        "_source": {
          "title": "Iliad",
          "author": "Homer",
          "year": 1999,
          "languages": [
            "German",
            "Chinese",
            "Ukrainian"
          ]
        }
      }
    ]
  }
}

Let's try to search for novels only by title. That is, if we search for "Peace", we want to get a hit for "War and Peace" by “Leo Tolstoy” but not for novels authored by others.

curl -XPOST "QBOX_ES_ENDPOINT/novels/_search" -d '{
    "query": {
        "query_string": {
            "query": "war",
            "fields": ["title"]
        }
    }
}'

As expected we get a single hit, the novel with the word "war" in its title.

{
  "took": 14,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.15342641,
    "hits": [
      {
        "_index": "novels",
        "_type": "popular",
        "_id": "5",
        "_score": 0.15342641,
        "_source": {
          "title": "War and Peace",
          "author": "Leo Tolstoy",
          "year": 2008,
          "languages": [
            "Russian",
            "French",
            "English"
          ]
        }
      }
    ]
  }
}

Filtering

Elasticsearch query DSL has a wide range of filters to choose from. For a simple case like where a certain field should match a specific value, a term filter must work well.

"filter": {
    "term": { "year": 1999 }
}

The complete search request now looks like this:

curl -XPOST "QBOX_ES_ENDPOINT/novels/_search" -d '{
    "query": {
        "filtered": {
            "query": {
                "query_string": {
                    "query": "German"
                }
            },
            "filter": {
                "term": { "year": 1999 }
            }
        }
    }
}'

When we execute it we, as expected, only get one hit with year == 1999 and one of language == “German”.

{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.375,
    "hits": [
      {
        "_index": "novels",
        "_type": "popular",
        "_id": "6",
        "_score": 0.375,
        "_source": {
          "title": "Iliad",
          "author": "Homer",
          "year": 1999,
          "languages": [
            "German",
            "Chinese",
            "Ukrainian"
          ]
        }
      }
    ]
  }
}

Filtering without a Query

In the above example we limit the results of a query string query using a filter. What if all we want to do is apply a filter? That is, we want all movies matching a certain criteria.

One solution for doing this is to modify our current search request, replacing the query string query in the filtered query with a match_all query which is a query that simply matches everything like this:

curl -XPOST "QBOX_ES_ENDPOINT/_search" -d '{
    "query": {
        "filtered": {
            "query": {
                "match_all": {
                }
            },
            "filter": {
                "term": { "year": 1999 }
            }
        }
    }
}'

Another, simpler option is to use a constant score query:

curl -XPOST "QBOX_ES_ENDPOINT/_search" -d '{
    "query": {
        "constant_score": {
            "filter": {
                "term": { "year": 1999 }
            }
        }
    }
}'

When we execute it we, as expected, only get two hits with year == 1999.

We've covered quite a lot of things in this tutorial. Still, we've barely scratched the surface of Elasticsearch goodness.

There's a lot more to searching with Elasticsearch than we've seen here. We can create search requests where we specify how many hits we want, use highlighting, get spelling or autocomplete suggestions, provide custom boosting and scoring, make use of stemming, stopping and a wide variety of language and phonetic analyzers, and much more. 

Also, the query DSL contains many interesting queries and filters that we can use. Then there's of course also a whole range of facets that we can use to extract statistics from our data or build navigations.

Qbox can also be used to provision Kibana, the web interface provisioned by Qbox. Kibana user interface can be used for filtering, sorting, discovering, and visualizing logs that are stored in Elasticsearch.

Give It a Whirl!

It's easy to spin up a standard hosted Elasticsearch cluster on any of our 47 Rackspace, Softlayer, Amazon, or Microsoft Azure data centers. And you can now provision your own AWS Credits on Qbox Private Hosted Elasticsearch

Questions? Drop us a note, and we'll get you a prompt response.

Not yet enjoying the benefits of a hosted ELK-stack enterprise search on Qbox? We invite you to create an account today and discover how easy it is to manage and scale your Elasticsearch environment in our cloud hosting service.

comments powered by Disqus