Recent Posts by Adam Vanderbush

VP Marketing for Qbox and Supergiant.io. Qbox is a a venture-backed company focusing on search as a service. Foundational cloud Elasticsearch product at Qbox helps users discover insights through data exploration and analytics. 

How many nodes should the cluster have? It's a difficult question. Ultimately, it will boil down to questions like the following: 

  1. How much data are you working with?

  2. How many searches will you be processing?

  3. How complex are your searches?

  4. How much resources will each node have to work with?

  5. How many indexes/applications will you be working with?

The answer to that question depends on a lot of factors, like expected load, data size, hardware, etc. In this tutorial post we discuss how to avoid the split brain problem.

Keep reading

While a search request returns a single “page” of results, the scroll API can be used to retrieve large numbers of results (or even all results) from a single search request, in much the same way as you would use a cursor on a traditional database. Scrolling is not intended for real time user requests, but rather for processing large amounts of data, e.g. in order to reindex the contents of one index into a new index with a different configuration.

The results that are returned from a scroll request reflect the state of the index at the time that the initial search request was made, like a snapshot in time. Subsequent changes to documents (index, update or delete) will only affect later search requests.

Keep reading

Logstash is a data pipeline that helps us process logs and other event data from a variety of systems. With 200 plugins and counting, Logstash can connect to a variety of sources and stream data at scale to a central analytics system. One of the most Logstash central analytics system is ELK stack (Elasticsearch, Logstash and Kibana).

The ability to efficiently analyze and query the data being shipped into the ELK Stack depends on the information being readable. This means that as unstructured data is being ingested into the system, it must be translated into structured message lines. Regardless of the defined data source, pulling the logs and performing some magic to beautify them is necessary to ensure that they are parsed correctly before being shipped to Elasticsearch

Keep reading

An Elasticsearch cluster may consist of a single node with a single index. Or it may have a hundred data nodes, three dedicated masters, a few dozen client nodes—all operating on a thousand indices (and tens of thousands of shards). No matter the scale of the cluster, you’ll want a quick way to assess the status of your cluster. The Cluster Health API fills that role. It can reassure you that everything is alright, or alert you to a problem somewhere in your cluster.

Keep reading

In the previous tutorial in ElastAlert Series, we implemented cardinality, percentage match and single metric aggregation rules for ElastAlert alerting via HipChat. We will be next looking into configuring and setting up alerting using ElastAlert on to the super-fast, simple and free messaging app Telegram.

ElastAlert is now available on Qbox provisioned Elasticsearch clusters and can be easily configured. Implementing ElastAlert is easy on Qbox. When you provision a cluster, there is a configuration box where you can input your Alert rules.  If you’re unclear how to structure rules in YAML, be sure to consult the ElastAlert Documentation.

Keep reading

Algorithmic stemmers apply a series of rules to a word in order to reduce it to its root form, such as stripping the final s or es from plurals. They don’t have to know about individual words in order to stem them. The dictionary stemmers work differently from algorithmic stemmers.

Instead of applying a standard set of rules to each word, they simply look up the word in the dictionary. Theoretically, they could produce much better results than an algorithmic stemmer.

A dictionary stemmer should be able to return the correct root word for irregular forms such as feet and mice. Additionally, it must be able to recognize the distinction between words that are similar but have different word senses, for example, organ and organization.

Elasticsearch provides dictionary-based stemming via the Hunspell token filter. Hunspell is the spell checker used by OpenOffice, LibreOffice, Chrome, Firefox, Thunderbird, and many other open and closed source projects.

Keep reading

Integrating an application with Elasticsearch can be achieved two ways, one using REST APIs, and the other using Native clients. In the article “REST Calls Made Easy - A New Elasticsearch Java Rest Client”, we covered extensively the new Java REST Client API to integrate easily with Elasticsearch.

Keep reading

This tutorial explains how to configure alerting using ElastAlert with the popular proprietary issue tracking product JIRA.

ElastAlert is now available on Qbox provisioned Elasticsearch clusters and can be easily configured. Implementing ElastAlert is easy on Qbox. When you provision a cluster, there is a configuration box where you can input your Alert rules.  If you’re unclear how to structure rules in YAML, be sure to consult the ElastAlert Documentation.

Keep reading

In the previous tutorial in ElastAlert Series, we implemented new_term, change and spike rules for ElastAlert alerting via Slack. We will next be looking into configuring and setting up alerting using ElastAlert on to the popular cloud-based team collaboration tool HipChat.

Many organisations use Elasticsearch to rapidly prototype and launch new search applications, and moving quickly at scale raises challenges. In particular, we often encounter difficulty making changes to query logic without impacting users, as well as finding client library bugs, problems with multi-tenancy, and general reliability issues. As the number of queries grow, the Search Infrastructure faces difficulty in supporting the multitude of ways queries are being sent to Elasticsearch cluster. The infrastructure designed for a single team to communicate with a single cluster does not scale to tens of teams and tens of clusters.

Indexing in large volumes require instantaneous alerting on anomalies, spikes, or other patterns of interest from data in Elasticsearch. If you have data being written into Elasticsearch in near real time and want to be alerted when that data matches certain patterns, ElastAlert is the tool for you.

Keep reading

In the previous tutorial in ElastAlert Series, we implemented flatline, frequency and blacklist rules for ElastAlert alerting via Email. We will be next looking into configuring and setting up alerting using ElastAlert on to the popular cloud-based team collaboration tool Slack.

ElastAlert was developed to automatically query and analyze the log data in Elasticsearch clusters and generate alerts based on easy-to-write rules. The initial goal was to create a comprehensive log management system for the data. It is easy to configure a few basic alerts such as “Send us an email if a user fails login X times in a day” or “Send a Sensu alert if the number of error messages spikes.” But, the usual requirement is a generic architecture which could suit almost any alerting scenario needed across any organisation using Elasticsearch. ElastAlert takes a set of “rules”, each of which has a pattern that matches data and a specific alert action it will take when triggered. For each rule, ElastAlert will query Elasticsearch periodically to grab relevant data in near real time.

ElastAlert is now available on Qbox provisioned Elasticsearch clusters and can be easily configured. Implementing ElastAlert is easy on Qbox. When you provision a cluster, there is a configuration box where you can input your Alert rules.  If you’re unclear how to structure rules in YAML, be sure to consult the ElastAlert Documentation.

Keep reading