Tutorial Series: Data Modeling and Relationships in Elasticsearch

Handling relationships between entities is not as obvious as it is with a dedicated relational store. The golden rule of a relational database, i.e., normalise your data, does not apply to Elasticsearch. This tutorial series will walk through Handling Relationships, Nested Objects, and Parent-Child Relationship to discuss the pros and cons of each of the available approaches.

Modeling and Managing Relationships in Elasticsearch

Elasticsearch is a different kind of beast, especially if you come from the world of SQL. It comes with many benefits: performance, scale, near real-time search, and analytics across massive amounts of data.

Handling relationships between entities is not as obvious as it is with a dedicated relational store. The golden rule of a relational database, i.e., normalize your data, does not apply to Elasticsearch. This tutorial series will walk through Handling Relationships, Nested Objects, and Parent-Child Relationship to discuss the pros and cons of each of the available approaches.

Keep reading

Handling Data Denormalization Issues in Elasticsearch

Complex relational databases can lead to tortuous SQL queries and slow responses from the web application. If you’re trying to return a long list of objects that are built up from five, ten or even seventeen related tables your response times can be unacceptably slow. 

Such problems are encountered regularly in large and complex data modeling applications. We have found that using Elasticsearch along with some conventions for denormalising complex objects can make it easy to generate sufficiently speedy responses, even when they are returning lots of rows.

Keep reading

Optimistic Concurrency Control in Elasticsearch

One of the key principles behind Elasticsearch is to allow you to make the most out of your data. Historically, search was a read-only enterprise where a search engine was loaded with data from a single source. As the usage grows and Elasticsearch becomes more central to your application, it happens that data needs to be updated by multiple components. 

Multiple components lead to concurrency and concurrency leads to conflicts. Elasticsearch's versioning system is there to help cope with those conflicts.

Keep reading

Data Concurrency Issues in Elasticsearch

We discussed about Data Denormalization in our previous post Denormalization and Concurrency Issues in Elasticsearch and had emulated a filesystem with directory trees in Elasticsearch, much like a filesystem on Linux: the root of the directory is /, and each directory can contain files and subdirectories. The problem comes when we want to allow more than one person to rename files or directories at the same time. We shall be discussing about Concurrency issues and various kinds of locking in Elasticsearch in this post.

Keep reading

Handling Relationships Using Nested Objects in Elasticsearch

When we are indexing data, the task is rarely as simple as each document existing in isolation. Sometimes, we are better off denormalizing all data into the child documents. For example, if we were modeling blog posts, adding an author field to blog could be a sensible choice; even if in the database, the authoritative datasource, the data is split into separate authors and blogs table. It’s simple and one can easily construct queries on both attributes of the blogs and the author’s name.

Keep reading

Aggregations with Nested Documents in Elasticsearch

A nested type is a specialized version of the object datatype that allows arrays of objects to be indexed and queried independently of each other. If you need to index arrays of objects and to maintain the independence of each object in the array, you should use the nested datatype instead of the object datatype. Internally, nested objects index each object in the array as a separate hidden document, meaning that each nested object can be queried independently of the others, with the nested query.

Keep reading

Sorting Nested Fields in Elasticsearch

Elasticsearch, by default, return the results sorted by relevance with the most relevant docs first. In order to sort by relevance, we need to represent relevance as a value. The relevance score of each document is represented by a positive floating-point number called the _score. The higher the _score, the more relevant the document.

Keep reading

Exploring Parent-Child Relationships in Elasticsearch

We have been discussing extensively on Handling Relationships and Data Modeling in our series so far. The need to bridge the gap between flat mapping and the real world has made us focus on the following techniques.

  • Application-side joins

  • Data denormalization

  • Nested objects

Keep reading

Searching Parent Child Relationships in Elasticsearch

We have already discussed about indexing parent-child relationships in elasticsearch. We gave realised that the parent-child functionality allows us to associate one document type with another, in a one-to-many relationship—one parent to many children.

For this post, we will be using hosted Elasticsearch on Qbox.io. You can sign up or launch your cluster here, or click "Get Started" in the header navigation. If you need help setting up, refer to "Provisioning a Qbox Elasticsearch Cluster."

The advantages that parent-child has over nested objects are as follows:

  • The parent document can be updated without reindexing the children.

  • Child documents can be added, changed, or deleted without affecting either the parent or other children. This is especially useful when child documents are large in number and need to be added or changed frequently.

  • Child documents can be returned as the results of a search request.

Keep reading

Elasticsearch Aggregations with Parent Child Relationships

In the past few articles, we have focused on indexing and searching parent-child relationships in elasticsearch. The parent-child functionality allows us to associate one document type with another, in a one-to-many relationship, or one parent to many children. In this tutorial, we continue with parent-child aggregations in elasticsearch.

Keep reading

Grandparents and Grandchildren Relationships in Elasticsearch

We have covered a lot on Parent-Child Relationships in Elasticsearch, indexing, searching, aggregations and the challenges it could easily face. We shall continue out streak with exploring further into Parent Child Relationships. The parent-child relationship is similar in nature to the nested model: both allows us to associate one entity with another. The difference is that, with nested objects, all entities live within the same document while, with parent-child, the parent and children are completely separate documents.

Keep reading

Performance Considerations in Parent Child Relationships in Elasticsearch

We have discussed indexing, searching, and aggregations for parent-child and grandparent-grandchildren relationships in elasticsearch. The parent-child functionality allows us to associate one document type with another, in a one-to-many relationship or one parent to many children.

Keep reading