We have been discussing extensively on Handling Relationships and Data Modeling in our series so far. The need to bridge the gap between flat mapping and the real world has made us focus on the following techniques.

  • Application-side joins

  • Data denormalization

  • Nested objects

Keep reading

Elasticsearch, by default, return the results sorted by relevance with the most relevant docs first. In order to sort by relevance, we need to represent relevance as a value. The relevance score of each document is represented by a positive floating-point number called the _score. The higher the _score, the more relevant the document.

Keep reading

In the previous tutorial, we have discussed how to use elasticsearch.js, the official Node.js client for Elasticsearch, to index, add documents, and search them using simple queries and Query DSL. In this tutorial, we're going to dive deeper into elasticsearch.js describing more advanced methods and concepts like scrolling, aggregations, and analyzers.

As always, we will be using hosted Elasticsearch on Qbox.io. We assume that you have installed the latest version of Node.js, downloaded the elasticsearch.js module into your Node.js application and connected it to your Elasticsearch cluster as described in the previous tutorial.

Keep reading

A nested type is a specialized version of the object datatype that allows arrays of objects to be indexed and queried independently of each other. If you need to index arrays of objects and to maintain the independence of each object in the array, you should use the nested datatype instead of the object datatype. Internally, nested objects index each object in the array as a separate hidden document, meaning that each nested object can be queried independently of the others, with the nested query.

Keep reading

When we are indexing data, the task is rarely as simple as each document existing in isolation. Sometimes, we are better off denormalizing all data into the child documents. For example, if we were modeling blog posts, adding an author field to blog could be a sensible choice; even if in the database, the authoritative datasource, the data is split into separate authors and blogs table. It’s simple and one can easily construct queries on both attributes of the blogs and the author’s name.

Keep reading

Complex relational databases can lead to tortuous SQL queries and slow responses from the web application. If you’re trying to return a long list of objects that are built up from five, ten or even seventeen related tables your response times can be unacceptably slow. 

Such problems are encountered regularly in large and complex data modeling applications. We have found that using Elasticsearch along with some conventions for denormalising complex objects can make it easy to generate sufficiently speedy responses, even when they are returning lots of rows.

Keep reading