Integrating an application with Elasticsearch can be achieved two ways, one using REST APIs, and the other using Native clients. In the article “REST Calls Made Easy – A New Elasticsearch Java Rest Client”, we covered extensively the new Java REST Client API to integrate easily with Elasticsearch.

When using native clients to integrate with Elasticsearch, whenever there is an major ES version upgrade, the native client needs to be upgraded too, thus requiring a significant effort in upgrade/maintenance. The Elasticsearch Java REST Client, apart from easing upgrades, also provides other significant benefits like below:

  • Load balancing across all available nodes.

  • Failover in case of node failures and upon specific response codes.

  • Persistent connections.

  • Trace logging of requests and responses.

  • Optional automatic discovery of cluster node.

  • Failed connection penalization.

As shown above, one of the key benefits the Java REST client provides is automatic discovery of cluster nodes. Even though this is an optional feature, wouldn’t it be nice to make use of it when:

  1. Developing an application so that the client automatically finds out which nodes are currently up/down in the cluster.
  2. Knowing which nodes it needs to use for load balancing requests.

This would abstract the scaling up/down of ES cluster from application logic.

Case Study: How Qbox Saved 5 Figures per Month using Supergiant

As covered in the part 1 of this tutorial, to make any calls to the Elasticsearch, we need to create a RestClient. This can be done using the RestClientBuilder class and like most builder classes, it allows us to configure parameters while building the instance. The only required argument for RestClient.builder is one or more HttpHost instances.

Example of creating a RestClient with only required arguments:

RestClient restClient = RestClient.builder(
       new HttpHost("localhost", 9200, "http"),
       new HttpHost("localhost", 9205, "http")).build();

While creating a RestClient, as shown above, we need to pass on the list of nodes that are part of the cluster. Say after a while due to the increased traffic or increased usage of our application, we want to scale up our Elasticsearch cluster and thus add few more nodes. 

How do we inform the new nodes to our RestClient? Should we stop our application, and update the new nodes during the RestClient creation? Or, is there any other alternative? As seen, disadvantages of this approach is as we scale up, it would require significant maintenance of our application logic. 

Wouldn’t it be nicer if our RestClient discovered the nodes in the cluster automatically as long as it knows one of the node instance details?

Sniffing, i.e automatic discovery of nodes in the Elasticsearch, is a feature which is supported in most of the native clients ES provides. Sniffing is supported in Java REST client too. Say we were using Transport Client or Python client for interacting with ES, we would enable sniffing as shown below:

# Python Client
from elasticsearch import Elasticsearch
es = Elasticsearch(["seed1", "seed2"],
         sniff_on_start=True,
         sniff_on_connection_fail=True,
         sniffer_timeout=60)
//Transport Client
Settings settings = Settings.builder()
                               .put("cluster.name", "qboxCluster").build()
       .put("client.transport.sniff", true).build();
TransportClient client = new PreBuiltTransportClient(settings)
                                                                                            .addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("eshost1"), 9300));

So how do we enable sniffing for Java REST client? As we mentioned earlier, sniffing is an optional feature, and we would have to add additional dependency in addition to rest client library to the project. We have to add the following additional dependency in the application pom.xml file as shown below:

<dependency>
   <groupId>org.elasticsearch.client</groupId>
   <artifactId>sniffer</artifactId>
<version>5.3.2</version>
</dependency>

Sniffer makes use of the “Nodes” REST API (“/_nodes”) exposed by Elasticsearch to periodically fetch the list of available nodes in the cluster and updates the host lists maintained by the RestClient

Sniffer does the polling or fetching of the nodes in the cluster every 5 minutes. After the RestClient instance is created, one can associate the sniffer to it. As shown below, we can associate Sniffer to the RestClient instance and at the same time we have can override the sniffing interval too. In this case we have overriding it to 3 minutes rather than default 5 minutes:

RestClient restClient = RestClient.builder(
       new HttpHost("localhost", 9200, "http)
).build();
Sniffer sniffer = Sniffer.builder(restClient)
       .setSniffIntervalMillis(30000).build();

One can also find out the nodes failure as soon as it occurs rather than finding it out during the regular sniffing interval. In order to do so, SniffOnFailureListener needs to be created at first and provided at RestClient creation. 

Tutorial: Auto-Scaling Kubernetes Cluster on AWS EC2 with Supergiant

Once the Sniffer is created, it needs to be associated with the same SniffOnFailureListener instance, which will be notified at each failure. Make use of the default SniffOnFailureListener available or create custom listeners by either by extending RestClient.FailureListener or by extending SniffOnFailureListener itself. 

By overriding “onFailure” method we can have custom logic on Failure. For example, we want to trigger an email to the admin as soon as the listener is notified with a node failure. Below code shows creation of custom Failure Listener and associating the sniffer to RestClient.

//Custom Failure Listeners
import org.apache.http.HttpHost;
import org.elasticsearch.client.sniff.SniffOnFailureListener;
public class LogMailFailureListener extends SniffOnFailureListener {
  public LogMailFailureListener () {
       super();
   }
   @Override
   public void onFailure(HttpHost host) {
      System.out.println("Node failed: "+ host.getHostName()+ "-"+ host.getPort());
       //logic to send email goes here
    super.onFailure(host);
   }
       
 
}
//Attaching Sniffier Instance to RestClient
LogMailFailureListener sniffOnFailureListener = new LogMailFailureListener();
RestClient restClient = RestClient.builder(new HttpHost("localhost", 9200))
       .setFailureListener(sniffOnFailureListener).build();
Sniffer sniffer = Sniffer.builder(restClient).build();
sniffOnFailureListener.setSniffer(sniffer);

When using sniffing on failure, not only do the nodes get updated after each failure, but an additional sniffing round is also scheduled sooner than usual, by default one minute after the failure. Just like any instance created needs to be closed, it is important to close the Sniffer so that its background thread gets properly shutdown and all of its resources are released. The Sniffer object should have the same lifecycle as the RestClient and get closed right before the client.

sniffer.close();
restClient.close();

Conclusion

This article covered an import aspect of automatic discovery of nodes in the cluster using Java REST Client. With the complete knowledge of Java REST client covered in article “REST Calls Made Easy – A New Elasticsearch Java Rest Client”  and here, it should help you to get comfortably started with JAVA REST client. 

Other Helpful Tutorials

Give It a Whirl!

It’s easy to spin up a standard hosted Elasticsearch cluster on any of our 47 Rackspace, Softlayer, Amazon, or Microsoft Azure data centers. And you can now provision your own AWS Credits on Qbox Private Hosted Elasticsearch

Questions? Drop us a note, and we’ll get you a prompt response.

Not yet enjoying the benefits of a hosted ELK stack enterprise search on Qbox? We invite you to create an account today and discover how easy it is to manage and scale your Elasticsearch environment in our cloud hosting service.