Ask most folks to describe Elasticsearch, and you'll get a variety of answers. Many senior full-stack developers will struggle with the answer. They might know how to use it — but it's hard to get a clear, concise, and accurate answer. This can create no small amount of frustration in those who need to know: What is it? What does it do? How might I benefit?
We've got answers for you. Right here in this article. Comprehensive, yet easy for almost anyone to read. Enjoy. And ..... you're welcome!
Elementary, said Holmes to his dear friend Watson. Some concepts are quite easy to grasp. If he were a real person, Watson might not categorize Elasticsearch as such.
How would you, dear blog reader, describe Elasticsearch?
A recent video from Elastic shows how difficult it can be to find any consistency—even with the core of the Elasticsearch community.
Our team here at Qbox offers this description:
Elasticsearch is an open-source, broadly-distributable, readily-scalable, enterprise-grade search engine. Accessible through an extensive and elaborate API, Elasticsearch can power extremely fast searches that support your data discovery applications.
It is easy to get going with Elasticsearch. It ships with sensible defaults and hides complex search and distribution mechanics from beginners. It works quite well, right out of the box. With a short learning curve for grasping the basics, you can become productive very quickly.
Fast, Incisive Search against Large Volumes of Data
Conventional SQL database managements systems aren't really designed for full-text searches, and they certainly don't perform well against loosely structured raw data that resides outside the database. On the same hardware, queries that would take more than 10 seconds using SQL will return results in under 10 milliseconds in Elasticsearch.
A user expresses an ES query with a simple language, Query DSL. A query examines one or many target values, and scores each of the elements in the results according to how close they match the focus of the query. The query operators enable you to optimize simple or complex queries that often return results from large datasets in just a few milliseconds. The Elasticsearch design is much simpler and much leaner than a database constrained by schemas, tables, fields, rows, and columns.
Indexing Documents to the Repository
During an indexing operation, Elasticsearch converts raw data such as log files or message files into internal documents and stores them in a basic data structure similar to a JSON object. Each document is a simple set of correlating keys and values: the keys are strings, and the values are one of numerous data types—strings, numbers, dates, or lists.
Adding documents to Elasticsearch is easy — and it's easy to automate. Simply do an HTTP POST that transmits your document as a simple JSON object. Searches are also done with JSON: send your query in an HTTP GET with a JSON body. The RESTful API makes it easy to retrieve, submit, and verify data directly from a command line. Even if they are developing with a client such as Python or Ruby, many developers use the cURL tool for debugging and developing with Elasticsearch.
Denormalized Document Storage: Fast, Direct access to your Data
It's important to remember that Elasticsearch isn't a relational database, so DBMS concepts usually won't apply. The most important concept that you must set aside when coming over from conventional databases is normalization. Native Elasticsearch doesn’t permit joins or subqueries, so denormalizing your data is a essential.
ES will typically store a document once for each repository in which it resides. Although this is counterintuitive from the perspective of a conventional DBMS, it is optimal for Elasticsearch. Full text searches will be extremely fast because the documents are stored in close proximity to the corresponding metadata in the index. This design greatly reduces the number of data reads, and ES limits the index growth rate by keeping it compressed.
Broadly Distributable and Highly Scalable
Elasticsearch can scale up to thousands of servers and accommodate petabytes of data. Its enormous capacity results directly from its elaborate, distributed architecture. And yet the ES user can be thankfully unaware of nearly all of the automation and complexity that supports this distributed design.
If you were to run most of the examples in any of our tutorials or those found in the Elastic documentation—either on a single node or a 50-node cluster—everything would function exactly the same.
In Elasticsearch, these delicate and often intensive operations occur automatically and imperceptibly:
- Partitioning your documents across an arrangement of distinct shards (containers)
- In a multi-node cluster, distributing the documents to shards that resides across all of the nodes
- Balancing shards across all nodes in a cluster to evenly manage the indexing and search load
- With replication, duplicating each shard to provide data redundancy and failover
- Routing requests from any node in the cluster to specific nodes containing the specific data that you need
- Seamlessly adding and integrating new nodes as you find the need to increase the size of your cluster
- Redistributing shards to automatically recover from the loss of a node
Sharing our Battle-Hardened Expertise
We invite you to learn more in our extensive help library, which you can find in our Support Center and throughout our blog. Here is the short list that we recommend to those who are relatively new to Elasticsearch. You'll find a number of examples throughout.
We're always ready to help you achieve maximum success with your ES environment, and we hope that you find this article helpful. We welcome your comments below.
If you like this article, consider using our hosted elasticsearch service. It's stable, more affordable, and we offer free support. You can sign up or launch your cluster here, or click "Get Started" in the header navigation. If you need help setting up, refer to "Provisioning a Qbox Elasticsearch Cluster."