How to Export Data from Elasticsearch into a CSV File
Posted by Adam Vanderbush November 16, 2016This tutorial shows you how to export data from Elasticsearch into a CSV file. Imagine that you have some data in Elasticsearch that you would like to open up in Excel and create pivot tables from the data. This is just one use case where exporting data from Elasticsearch into a CSV file would be useful.
Install Logstash-Input-Elasticsearch Plugin. You can install it by running the following:
# cd /opt/logstash/ # bin/plugin install logstash-input-elasticsearch The use of bin/plugin is deprecated and will be removed in a feature release. Please use bin/logstash-plugin. Validating logstash-input-elasticsearch Installing logstash-input-elasticsearch Installation successful
Next you need to install the Logstash-output-csv plugin. Install it with:
# cd /opt/logstash # bin/logstash-plugin install logstash-output-csv Validating logstash-output-csv Installing logstash-output-csv Installation successful
You should be ready to go ahead now. We are going to write an elasticsearch query in the input section of the logstash configuration file that will return a bunch of JSON (the results of the query that you just ran).
Blog Post: How to Calculate Kubernetes Cost Savings
The logstash-output-csv plugin allows us to specify which fields to make use of in the results of the query we ran when we output to a CSV file. Below is a skeleton of what the Logstash configuration file will look like. This is just to give you an idea of what a configuration will look like and what each section does.
input { elasticsearch { hosts => "localhost:9200" index => "index-we-are-reading-froml" query => ' {"query": { .. #Insert your Elasticsearch query here } } } }}' } } output { csv { # This is the fields that you would like to output in CSV format. # The field needs to be one of the fields shown in the output when you run your # Elasticsearch query fields => ["field1", "field2", "field3","field4","field5"] # This is where we store output. We can use several files to store our output # by using a timestamp to determine the filename where to store output. path => "/tmp/csv-export.csv" } }
Helpful Shortcut
If you are lazy when it comes to writing elasticsearch queries in Sense or using curl, then you could just use a shortcut to do it. One shortcut I like to use involves creating a visualization with the data that you would like to have in the output of your query in Kibana, and then viewing the query in Kibana. Then, you can verify that the query works by adjusting it slightly and running it in Sense. If the query works, you will have to create a config file from it and try running it.
Let me show you the steps that I make use of. Take note that I am building on a previous blog post where I imported the Ashley Madison data dumps into Elasticsearch. You can use the steps I use for your use-case. Just adapt it for your use case.
Related Tutorial: How to Import from CSV into Elasticsearch
First, we start out by building a visualization in Kibana and viewing the query for that visualization. I created a query to show a data table of the email field in the aminno_member_email
index that I created in the Ashley Madison post. This is what the data table looks like:
While building the visualization, click on the bottom arrow which is facing up and view what the request that Kibana builds in order to construct a graph from the data in Elasticsearch. This is where we view the query. See the below screenshot:
This is the query as I got it from Sense:
{ "size": 0, "query": { "filtered": { "query": { "query_string": { "query": "*", "analyze_wildcard": true } }, "filter": { "bool": { "must": [ { "range": { "@timestamp": { "gte": 1443607774987, "lte": 1475230174987, "format": "epoch_millis" } } } ], "must_not": [] } } } }, "aggs": { "2": { "terms": { "field": "email", "size": 100, "order": { "_count": "desc" } } } } }
We can almost directly make use of the query to view the output of the query in Sense by copying and pasting it into Sense. Take note of how I added the Index to search and a few other small things:
This is what I put into Sense in order to see what the query actually returns:
GET /aminno_member_email/_search?pretty { "size": 0, "query": { "filtered": { "query": { "query_string": { "query": "*", "analyze_wildcard": true } }, "filter": { "bool": { "must": [ { "range": { "@timestamp": { "gte": 1443607774987, "lte": 1475230174987, "format": "epoch_millis" } } } ], "must_not": [] } } } }, "aggs": { "2": { "terms": { "field": "email", "size": 100, "order": { "_count": "desc" } } } } }
I had to remove the aggregation section in order to be able to use the query in my logstash config. You can try this query in sense too. It has no aggregation section added and it pretty much returns the same thing:
GET /aminno_member_email/_search?pretty { "query": { "filtered": { "query": { "query_string": { "query": "*", "analyze_wildcard": true } }, "filter": { "bool": { "must": [ { "range": { "@timestamp": { "gte": 1443607774987, "lte": 1475230174987, "format": "epoch_millis" } } } ], "must_not": [] } } } } }
This is my final configuration file. I will explain shortly what it does and how it works:
input { elasticsearch { hosts => "localhost:9200" index => "aminno_member_email" query => ' {"query": { "filtered": { "query": { "query_string": { "query": "*", "analyze_wildcard": true } }, "filter": { "bool": { "must": [ { "range": { "@timestamp": { "gte": 1443607774987, "lte": 1475230174987, "format": "epoch_millis" } } } ], "must_not": [] } } } }}' } } output { csv { fields => ["_id", "_score", "pnum","email","isvalid"] path => "/tmp/csv-export.csv" } }
You can save this configuration file to a file named output-csv.conf
and you can run it with Logstash with:
$ /opt/logstash/bin/logstash -f output-csv.conf
This creates a CSV file in the /tmp
directory which contains a few things; most notably an email address. You can view the contains of the file while Logstash is running on your configuration file by running:
$ tail -f /tmp/csv-export.csv ,,9815976,lisaweatherley321@btinernet.com,false ,,9815980,logan23weaponx@live.com,true ,,9815985,sonarc35@yahoo.com,true ,,9815987,matkwas85@gmail.com,true ,,9815988,andersonkthrn5@aol.com,false ,,9815993,jovanears@yahoo.com,false ,,9816000,PettyOfficerRaoul@gmail.com,true ,,9816004,infinitiqxx@gmail.com,false ,,9816006,greaser_tom@yahoo.com,false ,,9816016,o_castro_s@hotmail.com,true
As you can see this is a very simple example. Have a look at the the documentation for the logstash-output-csv plugin. The plugin has an option that you should consider using: spreadsheet_safe
. From what I’ve read, this option helps to prevent what has been coined as CEMI (CSV Excel Macro Injection). This is where user submitted data in your application is not properly escaped, of which could lead to macro’s being executed on a victim’s computer. You can read more about CEMI here.
Conclusion
This tutorial explains one method to export data from Elasticsearch into a CSV file. Naturally, there are much more options for optimization, but we hope this is enough info to get a good understanding of the methodology. Questions/Comments? Drop us a line below.