We have already discussed about indexing parent-child relationships in elasticsearch. We gave realised that the parent-child functionality allows us to associate one document type with another, in a one-to-many relationship—one parent to many children.

For this post, we will be using hosted Elasticsearch on Qbox.io. You can sign up or launch your cluster here, or click "Get Started" in the header navigation. If you need help setting up, refer to "Provisioning a Qbox Elasticsearch Cluster."

The advantages that parent-child has over nested objects are as follows:

  • The parent document can be updated without reindexing the children.

  • Child documents can be added, changed, or deleted without affecting either the parent or other children. This is especially useful when child documents are large in number and need to be added or changed frequently.

  • Child documents can be returned as the results of a search request.

Finding Parents by Children

The has_child query and filter can be used to find parent documents based on the contents of their children.

Let’s index a few parent documents similar to our previous post:

curl -XPOST 'ES_HOST:ES_PORT/academy/location/_bulk' -d
'{ "index": { "_id": "newyork" }}
{ "name": "Manhattan Academy", "state": "New York State", "country": "USA" }
{ "index": { "_id": "chicago" }}
{ "name": "Chicago Central", "state": "Illinois", "country": "USA" }
{ "index": { "_id": "dallas" }}
{ "name": "Dallas Academy", "state": "Texas", "country": "USA" }'

Let’s associate a few children (player) to our parent documents:

curl -XPOST 'ES_HOST:ES_PORT/academy/player/_bulk' -d
'{ "index": { "_id": 2, "parent": "chicago" }}
{ "name": "John Doe", "dob": "1998-07-18", "sport": "volleyball" }
{ "index": { "_id": 3, "parent": "newyork" }}
{ "name": "WIlliam Smith", "dob": "1996-11-07", "sport": "basketball" }
{ "index": { "_id": 4, "parent": "dallas" }}
{ "name": "John Henry", "dob": "1995-07-15", "sport": "billiards" }'

For instance, we could find all locations that have players born after 1993 with a query like this:

curl -XGET 'ES_HOST:ES_PORT/academy/location/_search' -d '{
  "query": {
    "has_child": {
      "type": "player",
      "query": {
        "range": {
          "dob": {
            "gte": "1993-12-31"
          }
        }
      }
    }
  }
}'

The has_child also has scoring support. The supported score modes are min, max, sum, avg or none. The default is none and yields the same behaviour as in previous versions. If the score mode is set to another value than none, the scores of all the matching child documents are aggregated into the associated parent documents. The score type can be specified with the score_mode field inside the has_child query.

The has_child query similar to the nested query could match several child documents, each with a different relevance score. How these scores are reduced to a single score for the parent document depends on the score_mode parameter. The default setting is none, which ignores the child scores and assigns a score of 1.0 to the parents, but it also accepts avg, min, max, and sum. The default score_mode of none is significantly faster than the other modes because Elasticsearch doesn’t need to calculate the score for each child document. Set it to avg, min, max, or sum only if you care about the score.

The following query will return both chicago and dallas, but chicago will get a better score because John Doe is a better match than John Henry:

curl -XGET 'ES_HOST:ES_PORT/academy/location/_search' -d '{
  "query": {
    "has_child": {
      "type": "player",
      "score_mode": "max",
      "query": {
        "match": {
          "name": "John Doe"
        }
      }
    }
  }
}'

Min_Children and Max_Children

The has_child query and filter both accept the min_children and max_children parameters, which will return the parent document only if the number of matching children is within the specified range.

This query will match only branches that have at least two employees:

curl -XGET 'ES_HOST:ES_PORT/academy/location/_search' -d '{
  "query": {
    "has_child": {
      "type": "player",
      "min_children": 2,
      "query": {
        "match_all": {}
      }
    }
  }
}'

This query will match only branches that have at most two employees:

curl -XGET 'ES_HOST:ES_PORT/academy/location/_search' -d '{
  "query": {
    "has_child": {
      "type": "player",
      "max_children": 2,
      "query": {
        "match_all": {}
      }
    }
  }
}'

The performance of a has_child query or filter with the min_children or max_children parameters is much the same as a has_child query with scoring enabled.

The min_children and max_children parameters can be combined with the score_mode parameter like the following query:

curl -XGET 'ES_HOST:ES_PORT/academy/location/_search' -d '{
  "query": {
     "has_child": {
     "type": "player",
     "score_mode" : "min",
     "min_children": 2,
     "max_children": 4,
     "query": {
        "match_all": {}
      }
    }
  }
}'

It is important to note that the has_child filter works in the same way as the has_child query, except that it doesn’t support the score_mode parameter. It can be used only in filter context such as inside a filtered query and behaves like any other filter: it includes or excludes, but doesn’t score. While the results of a has_child filter are not cached, the usual caching rules apply to the filter inside the has_child filter.

Finding Children by Their Parents

While a nested query can always return only the root document as a result, parent and child documents are independent and each can be queried independently. The has_child query allows us to return parents based on data in their children, and the has_parent query returns children based on data in their parents.

It looks very similar to the has_child query. This example returns players who are based in the Chicago Central Academy in Illinois state:

curl -XGET 'ES_HOST:ES_PORT/academy/player/_search' -d '{
  "query": {
    "has_parent": {
      "type": "location",
      "query": {
        "match": {
          "state": "Illinois"
        }
      }
    }
  }
}'

The has_parent query also supports the score_mode, but it accepts only two settings: none (the default) and score. Each child can have only one parent, so there is no need to reduce multiple scores into a single score for the child. The choice is simply between using the score (score) or not (none).

It is however important to note that when used in non-scoring mode (e.g. inside a filter clause), the has_parent query no longer supports the score_mode parameter. Because it is merely including/excluding documents and not scoring, the score_mode parameter no longer applies.

Give it a Whirl!

It's easy to spin up a standard hosted Elasticsearch cluster on any of our 47 Rackspace, Softlayer, or Amazon data centers. And you can now provision your own AWS Credits on Qbox Private Hosted Elasticsearch

Questions? Drop us a note, and we'll get you a prompt response.

Not yet enjoying the benefits of a hosted ELK-stack enterprise search on Qbox? We invite you to create an account today and discover how easy it is to manage and scale your Elasticsearch environment in our cloud hosting service.

comments powered by Disqus