Tutorial Series: Elastic Stack 5.0 Ingest APIs

Automatic Keyword Extraction via the Elasticsearch 5.0 Ingest API

Before setting up elasticsearch to fulfill entity extraction, it is worth checking out how it became such an easy task. There is a lot of buzz around the new Ingest API shipped with elasticsearch 5.x.

The Ingest API allows data manipulation and enrichment by defining a pipeline through which every document is subject to pass. This pipeline is created with a set of processors - each of which do specific tasks that enrich our data. A typical example of the processor is a grok processor, which allows you to modify and structure your unstructured log using pattern matching. Elasticsearch 5 ships many built-in processors about which you can read here.

Keep reading

Introduction to Indexing via the Elastic Stack 5.0 Ingest APIs

With the first alpha release of Elasticsearch 5.0 comes a ton of new and awesome features, and if you've been paying attention then you know that one of the more prominent of these features is the new shiny ingest node. Simply put, ingest aims to provide a lightweight solution for pre-processing and enriching documents within Elasticsearch itself before they are indexed.

We can use ingest node to pre-process documents before the actual indexing takes place. This pre-processing happens by an ingest node that intercepts bulk and index requests, applies the transformations, and then passes the documents back to the index or bulk APIs.

Keep reading

How to Index CSV Files with the Elasticsearch 5.0 Ingest API

We previously discussed the Automatic Keyword Extraction via the Elasticsearch 5.0 Ingest API. Here, we will go over what is an Ingest Node, what type of operations one can perform, and show a specific example starting from scratch to parse and display CSV data using Elasticsearch and Kibana. 

Indexing documents into the cluster can be done in a couple of ways: 

  • Logstash to read from source and send documents to the cluster

  • Filebeat to read a log file, send documents to Kafka, let Logstash connect to Kafka and transform the log event and then send those documents to the cluster

  • curl and the Bulk API to index a pre-formatted file

  • Java Transport Client from within a custom application

Before Elasticsearch version 5.x, however, there were mainly two ways to transform the source data to the document (Logstash filters or you had to do it yourself).

In Elasticsearch 5.x the concept of the Ingest Node has been introduced. It is just a node in the cluster like any other but with the ability to create a pipeline of processors that can modify incoming documents. The most frequently used Logstash filters have been implemented as processors.

We can use ingest node to pre-process documents before the actual indexing takes place. This pre-processing happens by an ingest node that intercepts bulk and index requests, applies the transformations, and then passes the documents back to the index or bulk APIs.

Keep reading

How to Index Attachments and Files to Elasticsearch 5.0 using Ingest API

Elasticsearch is generally used to index data of types like string, number, date, etc. However, what if you wanted to index a file like a .pdf or a .doc directly and make it searchable? This is a real-time use case in applications like HCM, ERP, and ecommerce.

Keep reading

How to Extract and Index User-Agent Header Details to Elasticsearch 5.0

Elasticsearch is generally used to index data of types like string, number, date, etc. However, what if you wanted to index a file like a .pdf or a .doc directly and make it searchable? This is a real-time use case in applications like HCM, ERP, and ecommerce.

Keep reading

How to Detect Language in Elasticsearch via the Langdetect Ingest Processor

Elasticsearch is generally used to index data of types like string, number, date, etc. However, what if you wanted to index a file like a .pdf or a .doc directly and make it searchable? This is a real-time use case in applications like HCM, ERP, and ecommerce.

Keep reading

How to Index the Geographical Location of IP Addresses to Elasticsearch 5.0

Elasticsearch is generally used to index data of types like string, number, date, etc. However, what if you wanted to index a file like a .pdf or a .doc directly and make it searchable? This is a real-time use case in applications like HCM, ERP, and ecommerce.

Keep reading

Accessing Data and Handling Failures in Ingest Pipelines

We have already discussed Elasticsearch 5.0 and its ton of new and awesome features, and if you've been paying attention, then you know that one of the more prominent of these features is the new shiny ingest node. Simply put, ingest aims to provide a lightweight solution for pre-processing and enriching documents within Elasticsearch itself before they are indexed.

We can use ingest node to pre-process documents before the actual indexing takes place. This pre-processing happens by an ingest node that intercepts bulk and index requests, applies the transformations, and then passes the documents back to the index or bulk APIs.

Keep reading