This tutorial shows you how to export data from Elasticsearch into a CSV file. Imagine that you have some data in Elasticsearch that you would like to open up in Excel and create pivot tables from the data. This is just one use case where exporting data from Elasticsearch into a CSV file would be useful. 

Install Logstash-Input-Elasticsearch Plugin. You can install it by running the following:

# cd /opt/logstash/
# bin/plugin install logstash-input-elasticsearch
The use of bin/plugin is deprecated and will be removed in a feature release. Please use bin/logstash-plugin.
Validating logstash-input-elasticsearch
Installing logstash-input-elasticsearch
Installation successful

export1.png#asset:1095

Next you need to install the Logstash-output-csv plugin. Install it with:

# cd /opt/logstash 
# bin/logstash-plugin install logstash-output-csv
Validating logstash-output-csv
Installing logstash-output-csv
Installation successful

export2.png#asset:1096

You should be ready to go ahead now. We are going to write an elasticsearch query in the input section of the logstash configuration file that will return a bunch of JSON (the results of the query that you just ran). 

Blog Post: How to Calculate Kubernetes Cost Savings

The logstash-output-csv plugin allows us to specify which fields to make use of in the results of the query we ran when we output to a CSV file.  Below is a skeleton of what the Logstash configuration file will look like. This is just to give you an idea of what a configuration will look like and what each section does.

input {
 elasticsearch {
    hosts => "localhost:9200"
    index => "index-we-are-reading-froml"
    query => '
  {"query": {
   .. 
#Insert your Elasticsearch query here
        }
      }
    }
}}'
  }
}
output {
  csv {
# This is the fields that you would like to output in CSV format.
# The field needs to be one of the fields shown in the output when you run your
# Elasticsearch query
    fields => ["field1", "field2", "field3","field4","field5"]
# This is where we store output. We can use several files to store our output
# by using a timestamp to determine the filename where to store output.    
path => "/tmp/csv-export.csv"
  }
}

Helpful Shortcut

If you are lazy when it comes to writing elasticsearch queries in Sense or using curl, then you could just use a shortcut to do it. One shortcut I like to use involves creating a visualization with the data that you would like to have in the output of your query in Kibana, and then viewing the query in Kibana. Then, you can verify that the query works by adjusting it slightly and running it in Sense. If the query works, you will have to create a config file from it and try running it. 

Let me show you the steps that I make use of. Take note that I am building on a previous blog post where I imported the Ashley Madison data dumps into Elasticsearch. You can use the steps I use for your use-case.  Just adapt it for your use case. 

Related Tutorial: How to Import from CSV into Elasticsearch 

First, we start out by building a visualization in Kibana and viewing the query for that visualization. I created a query to show a data table of the email field in the aminno_member_email index that I created in the Ashley Madison post. This is what the data table looks like:

export3.png#asset:1097

While building the visualization, click on the bottom arrow which is facing up and view what the request that Kibana builds in order to construct a graph from the data in Elasticsearch. This is where we view the query. See the below screenshot:

export4.png#asset:1098

This is the query as I got it from Sense:

{
  "size": 0,
  "query": {
    "filtered": {
      "query": {
        "query_string": {
          "query": "*",
"analyze_wildcard": true
        }
      },
      "filter": {
        "bool": {
          "must": [
            {
              "range": {
                "@timestamp": {
                  "gte": 1443607774987,
                  "lte": 1475230174987,
                  "format": "epoch_millis"
                }
              }
            }
          ],
          "must_not": []
        }
      }
    }
  },
  "aggs": {
    "2": {
      "terms": {
        "field": "email",
        "size": 100,
        "order": {
          "_count": "desc"
        }
      }
    }
  }
}

We can almost directly make use of the query to view the output of the query in Sense by copying and pasting it into Sense. Take note of how I added the Index to search and a few other small things:

export5.png#asset:1099

This is what I put into Sense in order to see what the query actually returns:

GET /aminno_member_email/_search?pretty
{
  "size": 0,
  "query": {
    "filtered": {
      "query": {
        "query_string": {
          "query": "*",
          "analyze_wildcard": true
        }
      },
      "filter": {
        "bool": {
          "must": [
            {
              "range": {
                "@timestamp": {
                  "gte": 1443607774987,
                  "lte": 1475230174987,
                  "format": "epoch_millis"
                }
              }
            }
          ],
          "must_not": []
        }
      }
    }
  },
"aggs": {
    "2": {
      "terms": {
        "field": "email",
        "size": 100,
        "order": {
          "_count": "desc"
        }
      }
    }
  }
}

I had to remove the aggregation section in order to be able to use the query in my logstash config. You can try this query in sense too. It has no aggregation section added and it pretty much returns the same thing:

GET /aminno_member_email/_search?pretty
{
  "query": {
    "filtered": {
      "query": {
        "query_string": {
          "query": "*",
          "analyze_wildcard": true
        }
      },
      "filter": {
        "bool": {
          "must": [
            {
              "range": {
                "@timestamp": {
                  "gte": 1443607774987,
                  "lte": 1475230174987,
                  "format": "epoch_millis"
                }
              }
            }
          ],
          "must_not": []
        }
      }
    }
}
}

This is my final configuration file. I will explain shortly what it does and how it works:

input {
 elasticsearch {
    hosts => "localhost:9200"
    index => "aminno_member_email"
    query => '
  {"query": {
    "filtered": {
      "query": {
        "query_string": {
          "query": "*",
          "analyze_wildcard": true
        }
      },
      "filter": {
        "bool": {
          "must": [
            {
              "range": {
                "@timestamp": {
                  "gte": 1443607774987,
                  "lte": 1475230174987,
                  "format": "epoch_millis"
                }
              }
            }
          ],
          "must_not": []
        }
      }
    }
}}'
  }
}
output {
  csv {
    fields => ["_id", "_score", "pnum","email","isvalid"]
    path => "/tmp/csv-export.csv"
  }
}


You can save this configuration file to a file named output-csv.conf and you can run it with Logstash with:

$ /opt/logstash/bin/logstash -f output-csv.conf

This creates a CSV file in the /tmp directory which contains a few things; most notably an email address. You can view the contains of the file while Logstash is running on your configuration file by running:

$ tail -f  /tmp/csv-export.csv 
,,9815976,lisaweatherley321@btinernet.com,false
,,9815980,logan23weaponx@live.com,true
,,9815985,sonarc35@yahoo.com,true
,,9815987,matkwas85@gmail.com,true
,,9815988,andersonkthrn5@aol.com,false
,,9815993,jovanears@yahoo.com,false
,,9816000,PettyOfficerRaoul@gmail.com,true
,,9816004,infinitiqxx@gmail.com,false
,,9816006,greaser_tom@yahoo.com,false
,,9816016,o_castro_s@hotmail.com,true

export6.png#asset:1100

As you can see this is a very simple example. Have a look at the the documentation for the logstash-output-csv plugin. The plugin has an option that you should consider using: spreadsheet_safe. From what I’ve read, this option helps to prevent what has been coined as CEMI (CSV Excel Macro Injection). This is where user submitted data in your application is not properly escaped, of which could lead to macro’s being executed on a victim’s computer. You can read more about CEMI here

Conclusion

This tutorial explains one method to export data from Elasticsearch into a CSV file. Naturally, there are much more options for optimization, but we hope this is enough info to get a good understanding of the methodology. Questions/Comments? Drop us a line below.