How many nodes should the cluster have? It's a difficult question. Ultimately, it will boil down to questions like the following: 

  1. How much data are you working with?

  2. How many searches will you be processing?

  3. How complex are your searches?

  4. How much resources will each node have to work with?

  5. How many indexes/applications will you be working with?

The answer to that question depends on a lot of factors, like expected load, data size, hardware, etc. In this tutorial post we discuss how to avoid the split brain problem.

Keep reading

While a search request returns a single “page” of results, the scroll API can be used to retrieve large numbers of results (or even all results) from a single search request, in much the same way as you would use a cursor on a traditional database. Scrolling is not intended for real time user requests, but rather for processing large amounts of data, e.g. in order to reindex the contents of one index into a new index with a different configuration.

The results that are returned from a scroll request reflect the state of the index at the time that the initial search request was made, like a snapshot in time. Subsequent changes to documents (index, update or delete) will only affect later search requests.

Keep reading

Not yet enjoying the benefits of a hosted ELK-stack enterprise search on Qbox? Discover how easy it is to manage and scale your Elasticsearch environment.

Get Started 5 minutes to get started

Logstash is a data pipeline that helps us process logs and other event data from a variety of systems. With 200 plugins and counting, Logstash can connect to a variety of sources and stream data at scale to a central analytics system. One of the most Logstash central analytics system is ELK stack (Elasticsearch, Logstash and Kibana).

The ability to efficiently analyze and query the data being shipped into the ELK Stack depends on the information being readable. This means that as unstructured data is being ingested into the system, it must be translated into structured message lines. Regardless of the defined data source, pulling the logs and performing some magic to beautify them is necessary to ensure that they are parsed correctly before being shipped to Elasticsearch

Keep reading

An Elasticsearch cluster may consist of a single node with a single index. Or it may have a hundred data nodes, three dedicated masters, a few dozen client nodes—all operating on a thousand indices (and tens of thousands of shards). No matter the scale of the cluster, you’ll want a quick way to assess the status of your cluster. The Cluster Health API fills that role. It can reassure you that everything is alright, or alert you to a problem somewhere in your cluster.

Keep reading

In the previous tutorial in ElastAlert Series, we implemented cardinality, percentage match and single metric aggregation rules for ElastAlert alerting via HipChat. We will be next looking into configuring and setting up alerting using ElastAlert on to the super-fast, simple and free messaging app Telegram.

ElastAlert is now available on Qbox provisioned Elasticsearch clusters and can be easily configured. Implementing ElastAlert is easy on Qbox. When you provision a cluster, there is a configuration box where you can input your Alert rules.  If you’re unclear how to structure rules in YAML, be sure to consult the ElastAlert Documentation.

Keep reading

Algorithmic stemmers apply a series of rules to a word in order to reduce it to its root form, such as stripping the final s or es from plurals. They don’t have to know about individual words in order to stem them. The dictionary stemmers work differently from algorithmic stemmers.

Instead of applying a standard set of rules to each word, they simply look up the word in the dictionary. Theoretically, they could produce much better results than an algorithmic stemmer.

A dictionary stemmer should be able to return the correct root word for irregular forms such as feet and mice. Additionally, it must be able to recognize the distinction between words that are similar but have different word senses, for example, organ and organization.

Elasticsearch provides dictionary-based stemming via the Hunspell token filter. Hunspell is the spell checker used by OpenOffice, LibreOffice, Chrome, Firefox, Thunderbird, and many other open and closed source projects.

Keep reading