In the previous tutorial, we learned how to set up a QBox Cluster with the ES-Hadoop connector to interface with Hadoop’s data warehouse component, Hive, to perform SQL queries on top of Elasticsearch. The benefits of offloading and manipulating ES indices with Hive enable a multitude of possibilities for high-performing, deeper analysis across large data sets.  

In this tutorial we will take it a step further, by using Logstash to import an existing data set in the form of a CSV file into Elasticsearch in order to perform later batch-analytics in Hadoop’s powerful ecosystem.

Keep reading

For the past four years or so, the term “Big Data” has been loosely thrown around marketing and tech conferences, publications, blog articles, and everywhere in between. The buzzword has since been defined and classified, but one particular distributed storage and processing ecosystem might as well be synonymous with it as well: Apache Hadoop.

Hadoop is composed of a very wide array of packages and tools that can bulk ingest and process data with the power of distributed clusters of commodity hardware and/or container technologies. So it comes to no surprise that organizations have been combining the power of Hadoop to perform deeper analytics and produce “actionable insights” with Elasticsearch for robust log and performance metric analysis.

In this tutorial we shall utilize the Elastic Hadoop connector to integrate Elasticsearch with a Hadoop cluster and introduce how external tables in Hive work with Elasticsearch mappings and bulk-loaded docs.

Keep reading

Not yet enjoying the benefits of a hosted ELK-stack enterprise search on Qbox? Discover how easy it is to manage and scale your Elasticsearch environment.

Get Started 5 minutes to get started

The suggest API is one of the important APIs in Elasticsearch. It is used extensively in search solutions to tremendously improve the user experience. Ranging from normal autocomplete to context based suggestions, this API has many interesting use cases, which we will explore. In this tutorial, we show how to implement a simple autocomplete with elasticsearch.

Keep reading

Sometimes when firing a query, it gets delayed, or the response time is slow. There could be a number of reasons for the sluggishness of the query; ranging from shard issues or from computing certain elements in the query. Elasticsearch, from version 2.2, provides the Profile API for users to inspect the query execution time and other details. In this blog post, we explore how the profile API can be used to look into query timings.

Keep reading

In this blog post, we explain memory related settings in detail, which can be used to give elasticsearch better performance especially at times of scaling. We also go over issues caused by poor memory settings, and the ways to overcome them.  

Keep reading

Are you looking for full-text search and highlight capability on .PDF, .doc, or .epub files that you have in your system? In this tutorial, we show you how with the mapper-attachment-plugin

Keep reading