Have you ever spent some time grappling with indexing, querying, or aggregating when dealing with an existing parent/child relationship? Maybe you've tried to cope with translating foreign key relationships or simulating database joins in Elasticsearch.

We frequently hear about these issues here at Qbox, and many of our staff have lent a hand to developers who need to update elements of a list property—or nested property—without updating the entire document.

We've come to realize that many developers are unaware of the parent-child construct, the native capability that is already present in Elasticsearch. It gives you the ability to:

  • Update the parent document without reindexing the children.
  • Get child documents in search request results.
  • Add, change, or delete child documents without affecting the parent—or other children. This is especially useful for large collections of child documents that require frequent changes.

Continue reading to see how your life as a developer might be easier by exploiting the parent-child relationship in Elasticsearch.

The parent-child relationship does have similarities with the nested model and, with either feature, you can associate one entity with another entity. In applications with many data writes, one prominent limitation for nested objects is that all entities live within the same document. In parent-child, the parent and children are entirely separate. Parent-child can be especially beneficial in write-intensive applications.

When you employ parent-child, you associate one document type to another in a one-to-many relationship—one parent to many children. Parent-child gives you the ability to:

  • Update the parent document without reindexing the children.
  • See the child documents in search request results.
  • Add, change, or delete child documents without affecting the parent—or other children. This is especially useful for large collections of child documents that require frequent changes.

Extensive experience has demonstrated that these tasks are quite tedious unless the developer exploits the parent-child relationship.

Not Nearly as Good as It Could Be

Let's say that you're using it in a .NET project and using a NEST client. You're thinking about ways of handling updates to a document that looks like this:

public class Class1
{
    public string Prop1 { get; set; }
    public string Prop2 { get; set; }
    public List<Class2> propList { get; set; }
}

And you're using this script to add something to the propList:

client.Update<Class1>(x => x
    .Id(1)
    .Index("index_name")
    .Script("ctx._source.propList += prop")
    .Params(p => p.Add("prop", newProp)));


It's all good—but not the best it could be. There's a limitation when it comes to updating an object property within propList. Yes, you could retrieve the entire document, find the object in the list, update the property, and then index the entire document again. But that is far more tedious than it has to be, and at some point you may encounter lower indexing performance.

There is a much more efficient approach to this design problem when you are managing various aspects of parent-child relationships.

Solve It with a Parent-Child Relationship

All that is necessary to establish the parent-child relationship is to specify which document type should be the parent of a child type. You've got to do it either at index creation time, or with the update-mapping API before the child type has been created.">

In these cases like the one above, we recommend going with a parent/child relationship. Have a look at an example in which we set up an index with this mapping:

PUT /test_index
{
   "mappings": {
      "parent_type": {
         "properties": {
            "num_prop": {
               "type": "integer"
            },
            "str_prop": {
               "type": "string"
            }
         }
      },
      "child_type": {
         "_parent": {
            "type": "parent_type"
         },
         "properties": {
            "child_num": {
               "type": "integer"
            },
            "child_str": {
               "type": "string"
            }
         }
      }
   }
}


Then we add some data:

POST /test_index/_bulk
{"index":{"_type":"parent_type","_id":1}}
{"num_prop":1,"str_prop":"hello"}
{"index":{"_type":"child_type","_id":1,"_parent":1}}
{"child_num":11,"child_str":"foo"}
{"index":{"_type":"child_type","_id":2,"_parent":1}}
{"child_num":12,"child_str":"bar"}
{"index":{"_type":"parent_type","_id":2}}
{"num_prop":2,"str_prop":"goodbye"}
{"index":{"_type":"child_type","_id":3,"_parent":2}}
{"child_num":21,"child_str":"baz"}


If we want to update a child document, we simply post a new version:

<code>POST /test_index/child_type/2?parent=1
{
   "child_num": 13,
   "child_str": "bars"
}
</code>

NOTE: It's necessary to provide the parent ID so that ES can route the request appropriately.


Now, if you like, go ahead with a partial, scripted update:

POST /test_index/child_type/3/_update?parent=2
{
   "script": "ctx._source.child_num+=1"
}


We can prove that this approach is correct by searching the child types:

POST /test_index/child_type/_search
...
{
   "took": 2,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 3,
      "max_score": 1,
      "hits": [
         {
            "_index": "test_index",
            "_type": "child_type",
            "_id": "1",
            "_score": 1,
            "_source": {
               "child_num": 11,
               "child_str": "foo"
            }
         },
         {
            "_index": "test_index",
            "_type": "child_type",
            "_id": "2",
            "_score": 1,
            "_source": {
               "child_num": 13,
               "child_str": "bars"
            }
         },
         {
            "_index": "test_index",
            "_type": "child_type",
            "_id": "3",
            "_score": 1,
            "_source": {
               "child_num": 22,
               "child_str": "baz"
            }
         }
      ]
   }
}

FINAL NOTE: Elasticsearch maintains a map of how parents correspond with their children, and query-time joins are fast because of this mapping. Keep in mind, however, that this places a minor limitation on the parent-child relationship: the parent document and all corresponding children must exist on the same shard.

Drop us a note at the link below if this has been helpful. We're happy to provide you with the code, along with a few more examples:

http://sense.qbox.io/gist/73f6d2f347a08bfe0c254a977a4a05a68d2f3a8d

">


Editor's note: This article was written with help from Sloan Ahrens, a Qbox co-founder and freelance data consultant.


comments powered by Disqus