Although elasticsearch can scale indefinitely, you should store required data only. This will speed up the search operation, as well as response time to retrieve the data, and even reduce resource utilization substantially.

Elasticsearch uses an “Inverted Index” to retrieve data that you are searching for. Although this algorithm is one of the best when it comes to text searching, keeping only the data that you need in the index is the best approach.

In this tutorial, we discuss data retention techniques that you can use in elasticsearch. This will obviously depend on the kind of data and your application, because some might need longer retention policies compared to others. 

Imagine an application that deals with finance and money transactions. Such applications will need all of the records forever. But, do these records need to always exist in elasticsearch? Does all of this data need to be quickly searchable?

Logstash provides methods where you can segregate different events, and then store it in standard file storage rather than elasticsearch for long-term storage.

Keep reading

Filebeat is extremely lightweight compared to its predecessors when it comes to efficiently sending log events. It uses lumberjack protocol, compression, and is easy to configure using a yaml file. It can send events directly to elasticsearch as well as logstash. It keeps track of files and position of its read, so that it can resume where it left of. 

The goal of this tutorial is to set up a proper environment to ship Linux system logs to Elasticsearch with Filebeat. It then shows helpful tips to make good use of the environment in Kibana.

Keep reading

With the first alpha release of Elasticsearch 5.0 comes a ton of new and awesome features, and if you've been paying attention then you know that one of the more prominent of these features is the new shiny ingest node. Simply put, ingest aims to provide a lightweight solution for pre-processing and enriching documents within Elasticsearch itself before they are indexed.

We can use ingest node to pre-process documents before the actual indexing takes place. This pre-processing happens by an ingest node that intercepts bulk and index requests, applies the transformations, and then passes the documents back to the index or bulk APIs.

Keep reading

While a search request returns a single “page” of results, the scroll API can be used to retrieve large numbers of results (or even all results) from a single search request, in much the same way as you would use a cursor on a traditional database. Scrolling is not intended for real time user requests, but rather for processing large amounts of data, e.g. in order to reindex the contents of one index into a new index with a different configuration.

The results that are returned from a scroll request reflect the state of the index at the time that the initial search request was made, like a snapshot in time. Subsequent changes to documents (index, update or delete) will only affect later search requests.

Keep reading

Logstash is a data pipeline that helps us process logs and other event data from a variety of systems. With 200 plugins and counting, Logstash can connect to a variety of sources and stream data at scale to a central analytics system. One of the most Logstash central analytics system is ELK stack (Elasticsearch, Logstash and Kibana).

The ability to efficiently analyze and query the data being shipped into the ELK Stack depends on the information being readable. This means that as unstructured data is being ingested into the system, it must be translated into structured message lines. Regardless of the defined data source, pulling the logs and performing some magic to beautify them is necessary to ensure that they are parsed correctly before being shipped to Elasticsearch

Keep reading

Redis, the popular open source in-memory data store, has been used as a persistent on-disk database that supports a variety of data structures such as lists, sets, sorted sets (with range queries), strings, geospatial indexes (with radius queries), bitmaps, hashes, and Hyper Logs. The in-memory store is used to solve various problems in areas such as real-time messaging, caching, and statistic calculation.

Provisioning an Elasticsearch cluster in Qbox is easy. In this article, we walk you through the initial steps to start and configure your cluster. We then setup and configure logstash to ship the logs to elasticsearch in order to monitor Redis performance. Redis performance logs shipped to elasticsearch can then be visualized and analyzed via Kibana dashboards.

Keep reading