Qbox Joins Instaclustr, Plans March 31, 2023, Sunset Date. Read about it in our blog post here.

For the past four years or so, the term “Big Data” has been loosely thrown around marketing and tech conferences, publications, blog articles, and everywhere in between. The buzzword has since been defined and classified, but one particular distributed storage and processing ecosystem might as well be synonymous with it as well: Apache Hadoop.

Hadoop is composed of a very wide array of packages and tools that can bulk ingest and process data with the power of distributed clusters of commodity hardware and/or container technologies. So it comes as no surprise that organizations have been combining the power of Hadoop to perform deeper analytics and produce “actionable insights” with Elasticsearch for robust log and performance metric analysis.

In this tutorial, we shall utilize the Elastic Hadoop connector to integrate Elasticsearch with a Hadoop cluster and introduce readers to how external tables in Hive work with Elasticsearch mappings and bulk-loaded docs.

Keep reading

Scaling Elasticsearch is not an easy task. In this article, we go over different methods to make a High-Availability Logstash Indexing Solution using Qbox Hosted Elasticsearch.

Logstash Indexer is the component that indexes events and sends them to Elasticsearch for faster searches. We will use multiple logstash indexers with the exact same configuration. Having multiple indexers with the same configuration opens up different possibilities to make a highly available logstash solution for your ELK stack. These indexer nodes with identical configuration can easily be created using configuration management tools like Puppet or Chef.

Keep reading

Effective log management involves a possibility to instantly draw useful insights from millions of log entries, identify issues as they arise, and visualize/communicate patterns that emerge out of your application logs. Fortunately, ELK stack (Elasticsearch, Logstash, and Kibana) makes it easy to ship logs from your application to ES collections for storage and analysis.

Recently, Elastic infrastructure was extended by useful tools for shipping logs called Beats. Filebeat is a part of Beats tool set that can be configured to send log events either to Logstash (and from there to Elasticsearch), or even directly to the Elasticsearch. The tool turns your logs into searchable and filterable ES documents with fields and properties that can be easily visualized and analyzed.

In a previous post, we discussed how to use Filebeat to ship Linux system logs. Now, it’s time to show how to ship logs from your MySQL database via Filebeat transport to your Elasticsearch cluster. Making MySQL general and slow logs accessible via Kibana and Logstash will radically improve your database management, log analysis and pattern discovery leveraging the full potential of ELK stack.

Keep reading

In this tutorial, we’ll use Lassie, a Python library for retrieving content from websites, to fetch information regarding a Qbox YouTube video as JSON. We’ll then store that data in our Qbox Elasticsearch cluster using elasticsearch-py, Elasticsearch’s official low-level Python client. We’ll also use elasticsearch-py to query and return the record we indexed.

Although this example is minimal and the choice of a YouTube video to index is somewhat arbitrary, the concept it demonstrates has larger practical applications. For example, a company could build a vertical search engine collecting all information about it found online. The user-friendliness of Lassie and Python would enable a task like this to be done in relatively fewer lines of code and with syntax easily understood, even by those new to programming.

Keep reading

Although elasticsearch can scale indefinitely, you should store required data only. This will speed up the search operation, as well as response time to retrieve the data, and even reduce resource utilization substantially.

Elasticsearch uses an “Inverted Index” to retrieve data that you are searching for. Although this algorithm is one of the best when it comes to text searching, keeping only the data that you need in the index is the best approach.

In this tutorial, we discuss data retention techniques that you can use in elasticsearch. This will obviously depend on the kind of data and your application, because some might need longer retention policies compared to others.

Imagine an application that deals with finance and money transactions. Such applications will need all of the records forever. But, do these records need to always exist in elasticsearch? Does all of this data need to be quickly searchable?

Logstash provides methods where you can segregate different events, and then store it in standard file storage rather than elasticsearch for long-term storage.

Keep reading

Filebeat is extremely lightweight compared to its predecessors when it comes to efficiently sending log events. It uses lumberjack protocol, compression, and is easy to configure using a yaml file. It can send events directly to elasticsearch as well as logstash. It keeps track of files and position of its read, so that it can resume where it left of.

The goal of this tutorial is to set up a proper environment to ship Linux system logs to Elasticsearch with Filebeat. It then shows helpful tips to make good use of the environment in Kibana.

Keep reading