Is there a simple way to index emails to Elasticsearch? Logstash is the answer. Logstash is an open source, server-side data processing pipeline that ingests data from a multitude of sources simultaneously, transforms it, and then sends it to your favorite "stash."  Here, “stash” means products like Elasticsearch, PagerDuty, Email, Nagios, Jira, and more. 

The Logstash event processing pipeline has three stages: inputs → filters → outputs. Inputs generate events, filters modify them, and outputs ship them anywhere. Inputs and outputs support codecs that enable you to encode or decode the data as it enters or exits the pipeline without having to use a separate filter.

Tutorial

In order to index emails to Elasticsearch, we need to use the Logstash input plugin named “logstash-input-imap”. This plugin periodically reads emails from the IMAP server. Logstash ships many of the plugins by default, and “imap” (logstash-input-imap) is not an exception.

Let’s start Logstash with the basic pipeline as shown below:

bin/logstash -e 'input { stdin { } } output { stdout {} }'

When a Logstash instance is run, apart from starting the configured pipelines, it also starts the Logstash monitoring API endpoint at the port 9600. Note that Logstash monitoring APIs are only available from Logstash 5.0+ onwards. We have covered  Logstash monitoring APIs in-depth earlier.

Open the browser and access http://localhost:9600/_node/plugins to verify the list of plugins installed. You should see the list of plugins activated in the current Logstash instance in the response. You can scroll down to verify that “logstash-input-imap” plugin is available/installed, as shown below:

logstash.jpg#asset:1514

Next, we need to configure the Logstash pipeline with “logstash-input-imap” as the input. The only required configurations for this plugin are “host”, “password”, and “user”.

Depending on the settings required by the “IMAP” server which you want to connect to, you might need to set value for additional configurations like “port”, “secure” etc. The “host” is where you would specify your IMAP server details, “user” and “password” is where one needs to specify the user credentials to authenticate/connect to IMAP server.

#email_log.conf
input {
imap {
    host => "imap.qbox.com"
    password => "secertpassword"
    user => "user1@qbox.com"
    port => 993
    check_interval => 10
    folder => "Inbox"
    }
}
output {
  stdout { codec => rubydebug }
  elasticsearch {
    index => "emails"
    document_type => "email"
    hosts => "localhost:9200"
    }
}

By default, “logstash-input-imap” plugin reads from the folder “INBOX” and polls the IMAP server every 300 seconds. In the above configuration, I have overridden those settings, as well as the port.

Now, let’s start the Logstash agent so that it starts listening to the incoming emails from IMAP server:

bin/logstash -f email_log.conf --config.reload.automatic

Note: In the development mode, it’s good to enable automatic config reloading (--config.reload.automatic), so you don't have to restart Logstash every time a change to the pipeline/configuration is made.

Also, notice that in the configuration file, I have enabled stdout as the output. Therefore, as soon as I receive new emails in the “Inbox” folder, we should start seeing output in the console too. 

Below is the console output for the new email:

{
           "date" => "Fri, 24 Mar 2017 22:16:59 -0700",
   "mime-version" => "1.0",
     "x-priority" => "3",
        "subject" => "Testing IMAP",
        "message" => "Logstash is awesome!!\n",
     "@timestamp" => 2017-03-25T05:16:59.000Z,
       "x-mailer" => "Outlook 2.0.1.9.1 (1003210) [OL 15.0.4551.0 (x86)]",
         "sender" => "Qbox User 2 <user2@qbox.com>",
       "@version" => "1",
     "message-id" => "<8655d752-75ec-4c15-a98f-de8bf5b554d7@default>",
           "from" => "Qbox User 2 <user2@qbox.com>",
   "content-type" => "multipart/alternative; boundary=__1490419010695311623abhmp0005.qbox.com",
             "to" => "Qbox User 1 <user1@qbox.com>"
}

Let’s verify if the same event/incoming email was indexed to Elasticsearch, too. To do this, let’s call the Elasticsearch “Search” endpoint API:

curl -XGET "http://localhost:9200/emails/_search?pretty=true"
{
 "took" : 1,
 "timed_out" : false,
 "_shards" : {
   "total" : 5,
   "successful" : 5,
   "failed" : 0
 },
 "hits" : {
   "total" : 1,
   "max_score" : 1.0,
   "hits" : [
     {
       "_index" : "emails",
       "_type" : "email",
       "_id" : "AVsD5YVtDkLTLNh_lPBb",
       "_score" : 1.0,
       "_source" : {
         "date" : "Fri, 24 Mar 2017 22:16:59 -0700",
         "mime-version" : "1.0",
         "x-priority" : "3",
         "subject" : "Testing IMAP",
         "message" : "Logstash is awesome!!\n",
         "@timestamp" : "2017-03-25T05:16:59.000Z",
         "x-mailer" : "Outlook 2.0.1.9.1 (1003210) [OL 15.0.4551.0 (x86)]",
         "sender" : "Qbox User 2 <user2@qbox.com>",
         "@version" : "1",
         "message-id" : "<8655d752-75ec-4c15-a98f-de8bf5b554d7@default>",
         "from" : "Qbox User 2 <user2@qbox.com>",
         "content-type" : "multipart/alternative; boundary=__1490419010695311623abhmp0005.qbox.com",
         "to" : "Qbox User 1 <user1@qbox.com>"
       }
     }
   ]
 }
}

Great! The incoming email was successfully indexed.

Conditional Tagging

Now that we have the basic setup to index emails to Elasticsearch, we can add new fields, filters, conditional tags and more. Say we tag all the emails with the subject having a “critical/error” keyword in it. Depending on the tag, we can take actions in the output plugins like sending an email to support team, creating a Jira issue, sending PagerDuty event, etc. The possibilities are limitless. 

Below is a sample configuration showcasing conditional tagging:

#email_log.conf
input {
  imap {
    host => "imap.qbox.com"
    password => "secertpassword"
    user => "user1@qbox.com"
    port => 993
    check_interval => 10
    folder => "Inbox"
    add_field => {"parser" => "logstash"}
    }
}
filter {
 if "critical" in [subject]{
    mutate { add_tag => "critical" }}
 else if “error” in [subject]{ 
     mutate { add_tag => "error" }}
 }
output {
  if [critical] in [tags]{
     email {
        to=> ”support@qbox.com”
        subject => subject
        body => message
      }
   }
stdout { codec => rubydebug }
elasticsearch {
  index => "emails"
  document_type => "email"
  hosts => "localhost:9200"
 }
}

Conclusion

That's it! As you've seen, Logstash IMAP plugin makes it ridiculously easy to send emails to Elasticsearch or any other output. Now that we have our emails in Elasticsearch, we can write a simple search client or directly call Elasticsearch search endpoints from either curl or REST clients to start mining and analyzing them. Even better, we can apply various metrics and aggregations to our indexed emails to produce useful visualizations in Kibana. Adding emails to your data analytics pipeline in the ELK stack would definitely benefit your business intelligence. 

Other Helpful Tutorials

Give It a Whirl!

It's easy to spin up a standard hosted Elasticsearch cluster on any of our 47 Rackspace, Softlayer, Amazon, or Microsoft Azure data centers. And you can now provision your own AWS Credits on Qbox Private Hosted Elasticsearch

Questions? Drop us a note, and we'll get you a prompt response.

Not yet enjoying the benefits of a hosted ELK stack enterprise search on Qbox? We invite you to create an account today and discover how easy it is to manage and scale your Elasticsearch environment in our cloud hosting service.