In the first episode of this tutorial we’re going to explore some of the features of Elasticsearch, and in later episodes we’ll begin digging deep into more sophisticated features using hosted Elasticsearch with qbox.io. Thus begins our multi-part series giving some instruction on Elasticsearch and qbox.io‘s service benefits for Elasticsearch. I’ll be showing some interesting queries, how to structure your documents for interesting projects, and much more.
For this first episode we will be installing Elasticsearch, creating a cluster, indexing some documents, and changing some settings. In my previous video I discussed a few of the several dozen major advantages Elasticsearch offers. Today we will begin an introduction into setting up a local instance of Elasticsearch. Let’s start by downloading Elasticsearch, using 0.90.9 since at this time it is the current stable release.
Once that is finished, we have a couple of steps to take before we launch our server. Go ahead and hop into the Elasticsearch directory, open the config directory, and open elasticsearch.yml.
This yaml file provides a little over five dozen settings for Elasticsearch. First let’s change the default cluster name from the default “elasticsearch.” If we don’t change this, we can end up with some indexes we don’t want from others on the network who didn’t change their Elasticsearch cluster name. The same goes for your node name; make sure to change it. Data, config and log paths are all important to change as well. Set your paths so you can easily use these files again, which will also keep your files reusable in newer versions of Elasticsearch releases. Elasticsearch version releases are fairly frequent, usually occurring every other month or so.
The default shard size is 5, and the default replica size is 1. From kimchy’s (Creator of Elasticsearch) advice in the past, this will get you quite far. Elasticsearch comes with very sensible defaults that are great for getting up and going. All of these settings have very important roles, but for now we’re going to leave the defaults. I will go over these settings in more detail in later episodes, explaining shard and replica size, and providing more help understanding about how to figure those sizes.
Now we are ready to start our Elasticsearch server. Once we are in our Elasticsearch directory we will use
If you want to provide a path for your config file so you may reuse it on newer versions of Elasticsearch, create a folder and provide your path.
bin/elasticsearch -f -Des.default.config=PATH-TO-FILE/elasticsearch.yml
That’s how easy it is: we’ve just launched an Elasticsearch server. First we’ll run
curl -XGET 'http://localhost:9200'
to make sure our cluster is running. 9200 is the default port for the elasticsearch yaml file we examined earlier.
Now we will curl the settings of our cluster and find out what kind of settings we have.
curl -XGET 'http://localhost:9200/_settings'
What we have is an empty cluster, no index, no nothing. Let’s change this by grabbing a github repository I’ve provided.
This repo includes 10 JSON documents of Disney characters with some properties to have a little fun with.
One of the options for indexing documents is the bulk indexing api. All we have is 10 JSON documents with very basic strings as properties. An “action_and_meta” tag is required for every document when bulk indexing to ensure Elasticsearch knows what we want to do. We have several options when bulk indexing (delete, update, and create), but we’re just going to index our JSON documents. The name of our index is disney, and the type of documents we have is characters. We could use create, but that will fail if the document with the same index and type already exists. To make sure we don’t run into problems with you already having a disney index with these documents, we will index every document. If you’re looking to index from an existing dataset, we will go over some of those methods in the next episode.
We used our own _id so we can easily delete or update documents. Elasticsearch will automatically provide a dynamic type mapping for the specific property type of the document if one has not yet been created. It will also provide an _id if one is not specified automatically changing the op_type to create instead of index. As explained earlier, create will fail if any one of the same index, type, or id already exists. If you want more information on all the great things you can do with indexing, visit the index api. For now we have a character type with animated_debut, original_voice_actor, and name.
We will get into some interesting property relationships and search queries in the next episode. For now we will work with very basic documents with fairly simple properties this episode. Hop into the repo directory and index those documents on your running Elasticsearch server.
curl -s -XPOST 'localhost:9200/_bulk --data-binary @disney-data; echo'
Now we will search all the results! We’ll use pretty print to get a pretty response back.
curl -XGET 'localhost:9200/disney/character/_search?pretty=1'
As you’ll quickly notice, the search isn’t as amazing as it can be. You can follow up and check out Episode 2 and Episode 3 of this series to learn more about what makes Elasticsearch queries so powerful!