Scaling Elasticsearch is not an easy task. In this article, we go over different methods to make a High-Availability Logstash Indexing Solution using Qbox Hosted Elasticsearch.

Logstash Indexer is the component that indexes events and sends them to Elasticsearch for faster searches. We will use multiple logstash indexers with the exact same configuration. Having multiple indexers with the same configuration opens up different possibilities to make a highly available logstash solution for your ELK stack. These indexer nodes with identical configuration can easily be created using configuration management tools like Puppet or Chef.

Keep reading

Effective log management involves a possibility to instantly draw useful insights from millions of log entries, identify issues as they arise, and visualize/communicate patterns that emerge out of your application logs. Fortunately, ELK stack (Elasticsearch, Logstash, and Kibana) makes it easy to ship logs from your application to ES collections for storage and analysis. 

Recently, Elastic infrastructure was extended by useful tools for shipping logs called Beats. Filebeat is a part of Beats tool set that can be configured to send log events either to Logstash (and from there to Elasticsearch), or even directly to the Elasticsearch. The tool turns your logs into searchable and filterable ES documents with fields and properties that can be easily visualized and analyzed.

In a previous post, we discussed how to use Filebeat to ship Linux system logs. Now, it's time to show how to ship logs from your MySQL database via Filebeat transport to your Elasticsearch cluster. Making MySQL general and slow logs accessible via Kibana and Logstash will radically improve your database management, log analysis and pattern discovery leveraging the full potential of ELK stack.

Keep reading

In this tutorial, we'll use Lassie, a Python library for retrieving content from websites, to fetch information regarding a Qbox YouTube video as JSON. We'll then store that data in our Qbox Elasticsearch cluster using elasticsearch-py, Elasticsearch's official low-level Python client. We'll also use elasticsearch-py to query and return the record we indexed.

Although this example is minimal and the choice of a YouTube video to index is somewhat arbitrary, the concept it demonstrates has larger practical applications. For example, a company could build a vertical search engine collecting all information about it found online. The user-friendliness of Lassie and Python would enable a task like this to be done in relatively fewer lines of code and with syntax easily understood, even by those new to programming.

Keep reading

Although elasticsearch can scale indefinitely, you should store required data only. This will speed up the search operation, as well as response time to retrieve the data, and even reduce resource utilization substantially.

Elasticsearch uses an “Inverted Index” to retrieve data that you are searching for. Although this algorithm is one of the best when it comes to text searching, keeping only the data that you need in the index is the best approach.

In this tutorial, we discuss data retention techniques that you can use in elasticsearch. This will obviously depend on the kind of data and your application, because some might need longer retention policies compared to others. 

Imagine an application that deals with finance and money transactions. Such applications will need all of the records forever. But, do these records need to always exist in elasticsearch? Does all of this data need to be quickly searchable?

Logstash provides methods where you can segregate different events, and then store it in standard file storage rather than elasticsearch for long-term storage.

Keep reading

Filebeat is extremely lightweight compared to its predecessors when it comes to efficiently sending log events. It uses lumberjack protocol, compression, and is easy to configure using a yaml file. It can send events directly to elasticsearch as well as logstash. It keeps track of files and position of its read, so that it can resume where it left of. 

The goal of this tutorial is to set up a proper environment to ship Linux system logs to Elasticsearch with Filebeat. It then shows helpful tips to make good use of the environment in Kibana.

Keep reading

In a previous tutorial, we discussed how to use one of Rust's Elasticsearch clients, rs-es, to interact with Elasticsearch via REST API. Now, we'll take a look at the other Rust Elasticsearch client, elastic

Elastic, like rs-es, is idiomatic Elasticsearch, but unlike rs-es, it is not idiomatic Rust. Strong typing over document types and query responses is prioritized over providing a comprehensive mapping of the Query DSL into Rust constructs. The elastic project aims to be equally usable for developers with and without Rust experience.

Structurally, the elastic crate combines several other crates which can also be used independently depending on the user's needs. The first of these is elastic-reqwest, a synchronous implementation of the Elasticsearch REST API based on Rust's reqwest library. Elastic-reqwest serves as the HTTP backend for the elastic crate itself. 

Second is elastic-requests, a strongly-typed implementation of Elasticsearch's REST API. Third is elastic-responses, which integrates with elastic-reqwest and facilitates handling Elasticsearch search responses by creating iterators for search results. Finally, elastic-types allows custom definitions of Elasticsearch types as Rust structures. It uses serde, which we encountered in the prior Rust elasticsearch tutorial, for serialization.

Keep reading