In this tutorial, we discuss geolocation and explain what kinds of data notation can be used for locations, as well as how they can be aggregated and filtered.

Geolocation is a powerful a tool for searching. It can, for example, help you find out where most of the comments on your site come from, or find the nearest delivery for the user.

Geolocation Overview

Location can be saved in elasticsearch as a pair of numbers: latitude and longitude.  These accurately indicate the point on a map and is called geo-point. For example, a pair of numbers 50.01 is the latitude and 36.23 is the longitude for the center of Kharkov, Ukraine. Geo-points can be used to determine the distance to another point or whether the point falls into a well-known square. Latitude and longitude can be set as a string, array, or an object. We use all these methods in the examples below.

There are several standard filters:

  • geo_bounding_box – It looks for all points within a given rectangle.
  • geo_distance – Within a predetermined distance from the central point.
  • geo_distance_range – All points within a predetermined maximum and minimum distance from the central point.

Besides the simple setting of latitude and longitude, Elasticsearch allows you to of encode location as geohashes. This allows setting the latitude and longitude as a single alphanumeric code. Geohashes divides the world into a grid of 32 cells – 4 rows and 8 columns, each of which is labeled with a letter or digit.

Geocache code consists of a sequence of characters, where each subsequent  describes the location more accurately. You can follow any responses to create geohashes using GeohashExplorer. Let’s use it and get a location code for the example below. The geocache for Kharkov center is ubcu2kpvxupv.

For complex forms, there are two methods. Geo aggregations combine several geo-points. It is useful if the search returns too many results. Using it, you can connect geo-points in entities like:

  • geo_distance – Groups points into concentric circles around a central point.
  • geohash_grid – Groups points by geohash cell.
  • geo_bounds – Returns the latitude and longitude of a rectangle that covers all returns geo-points.

Geoshapes use an approach where all the complicated forms, such as points, lines, polygons, and multi-polygons or polygons with holes are drawn on the geo hash grid. Then it converts to a list of geohashes for all cells. Using this information, it’s easy to determine whether one form crosses another.

Tutorial

The choice of approach for recording and displaying points depends on the task and your capabilities.  Try this example. We’ve got an application that helps in training (keeps statistics, creates training plan and provides examples of individual exercises), and we want it to be enabled to find the nearest gyms, sports grounds, stadiums, swimming pools, etc., and sort them by distance, price, and reviews of other users.

Tutorial: How to Install Supergiant Container Orchestration Engine on AWS EC2

Create an index and mapping for gyms. Users’ location will be received as a pair of latitude and longitude coordinates.

curl -X PUT "http://localhost:9200/gym" -d '{
   "index": {
 },
   "analysis":{    
       "analyzer":{        
           "flat" : {
               "type" : "custom",
               "tokenizer" : "keyword",
               "filter" : "lowercase"
           }       }   }    }'
curl -X PUT "http://localhost:9200/gym/gyms/_mapping" -d '{
   "gyms" : {
   "properties" : {
       "name" : { "type" : "string" },
       "location" : { "type" : "geo_point"}
   }}
}'
curl -XPUT 'localhost:9200/gym/user/1?pretty' -d '{ 
       "name" :"name_user",
        "location": "50.0338, 36.2242 "
  }'

We have a list of gyms in the city with their coordinates, for example, 3 gyms and 2 gyms containing stadium or swimming pool respectively. Their coordinates can be defined using string, array, or object.

curl -XPUT 'localhost:9200/gym/gyms/1?pretty' -d '{ 
        "name" :"gym1",
        "location": {
        "lat": 50.0361,
        "lon":    36.2313    }
  }'
curl -XPUT 'localhost:9200/gym/gyms/2?pretty' -d '{ 
        "name" :"gym2",
        "location":  [49.9862, 36.2027]
  }'
curl -XPUT 'localhost:9200/gym/gyms/3?pretty' -d '{ 
        "name" :"gym3",
        "location": "49.9938, 36.2342"
  }'
curl -XPUT 'localhost:9200/gym/gyms/4?pretty' -d '{ 
        "name" :"gym+stadium",
        "location": "50.0173 , 36.2279"
  }'
curl -XPUT 'localhost:9200/gym/gyms/5?pretty' -d '{ 
        "name" :"gym+pool",
        "location": "50.0215, 36.2354"
  }'

Let’s find gyms that are nearest to the user, we can use the decay (Gauss) functionality of the function score query to achieve this.

curl -XPOST 'http://localhost:9200/gym/_search' -d '{
"query": {
   "function_score": {
     "functions": [
       {
         "gauss": {
           "location": {
             "scale": "1km",
              "origin": "50.0338, 36.2242 "        
}         }       }     ]   } }  }'

We’ve obtained a list of all the gyms. The first one will be the nearest; the other will be ordered by remoteness. However, this list can be uncomfortable and not necessary if there is a significant amount of objects in the database. Let’s improve our search and find all the gyms that are close to user’s home, within 5km. We do this by applying a filter.

curl -XPOST 'http://localhost:9200/gym/_search' -d '{
"query": {
   "filtered": {
     "filter": {
       "geo_distance": {
         "distance": "5km",
         "location": "50.0338, 36.2242 "
       }     }   }    }    }'

Now we’ve got a list containing 4 gyms. They are ordered from nearest to farthest. Also, we can find all the gyms within the district, which can be defined as a rectangle. To set a rectangle, it’s enough to specify top left and bottom right points.

curl -XPOST 'http://localhost:9200/gym/_search' -d '{
  "query": {
    "filtered": {
      "filter": {
        "geo_bounding_box": {
          "location": { 
            "top_left": {
              "lat":  50.0343,
              "lon": 36.2208
            },
            "bottom_right": {
              "lat": 50.0328,
              "lon": 36.2263
            }          }        }      }    }  }    }'

As a result, we will get an empty list, because there are no gyms inside our rectangle. You can increase the search area and get a valuable result.

Blog Post: How to Calculate Kubernetes Cost Savings

Let’s consider a more interesting case. We will look for the nearest gyms as well as categorize them: "next", "average distance" and "far". It is not so difficult to do this because there is a distance range aggregation in elasticsearch.

curl -XPOST 'http://localhost:9200/gym/_search?pretty' -d '{
"aggs": {
   "distanceRanges": {
     "geo_distance": {
       "field": "location",
       "origin": "50.0338, 36.2242 ",
       "unit": "meters",
       "ranges": [
         {
           "key": "Near by Locations",
           "to": 500
         },
         {
           "key": "Medium distance Locations",
           "from": 500,
           "to": 2000
         },
         {
           "key": "Far Away Locations",
           "from": 2000
         }       ]
     }   }    }    }'

As a result, we will find out how many gyms there are, in general, and how many gyms are there in each category. It is convenient for the user and for displaying results on the map.

"took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "hits" : {
    "total" : 5
  },
  "aggregations" : {
    "distanceRanges" : {
      "buckets" : [ {
        "key" : "Near by Locations",
        "from" : 0.0,
        "from_as_string" : "0.0",
        "to" : 500.0,
        "to_as_string" : "500.0",
        "doc_count" : 0
      }, {
        "key" : "Medium distance Locations",
        "from" : 500.0,
        "from_as_string" : "500.0",
        "to" : 2000.0,
        "to_as_string" : "2000.0",
        "doc_count" : 3
      }, {
        "key" : "Far Away Locations",
        "from" : 2000.0,
        "from_as_string" : "2000.0",
        "doc_count" : 2
      } ]
    }      }        }

Conclusion

In this post, we discussed how elasticsearch can work with geolocations and took a look at a few searches using its functionality. Hopefully you can use the logic discussed in this tutorial for your own needs. Questions/Comments? Drop us a line below.