Is there a simple way to index emails to elasticsearch? Logstash is the answer. Logstash is an open source, server-side data processing pipeline that ingests data from a multitude of sources simultaneously, transforms it, and then sends it to your favorite "stash."  Here, “stash” means products like Elasticsearch, PagerDuty, Email, Nagios, Jira, and more. 

The Logstash event processing pipeline has three stages: inputs → filters → outputs. Inputs generate events, filters modify them, and outputs ship them elsewhere. Inputs and outputs support codecs that enable you to encode or decode the data as it enters or exits the pipeline without having to use a separate filter.

Tutorial

In order to index emails to elasticsearch, we need to use the logstash input plugin named “logstash-input-imap”. This plugin periodically reads emails from the IMAP server. Logstash ships many of the plugins by default, and “imap” (logstash-input-imap) is shipped by default too.

Let’s start the logstash with the basic pipeline as shown below:

$logstash_home>bin/logstash -e 'input { stdin { } } output { stdout {} }'

When a logstash instance is run, apart from starting the configured pipelines, it also starts the logstash monitoring API endpoint at the port 9600. Logstash monitoring API’s are only available from Logstash 5.0+ onwards only. We have covered in depth about Logstash monitoring API’s.

Open the browser and execute the rest url http://localhost:9600/_node/plugins to verify the list of plugins installed. You should see the list of plugins the current logstash instance is running with, as a response from the endpoint. You can scroll through and verify “logstash-input-imap” is available/installed, as shown below.

logstash.jpg#asset:1514

Configure a logstash pipeline with “logstash-input-imap” as the input. The only required configurations for this plugins are “host”, “password”, and “user”.

Depending on the settings required by your “IMAP” server which you want to connect to, you might need to set value for additional configurations like “port”, “secure” etc. The “host” is where you would specify your IMAP server details, “user” and “password” is where one needs to specify the user credentials to authenticate/connect to IMAP server.

#email_log.conf
input {
imap {
    host => "imap.qbox.com"
    password => "secertpassword"
    user => "user1@qbox.com"
    port => 993
check_interval => 10
folder => "Inbox"
    }
}
output {
stdout { codec => rubydebug }
elasticsearch {
index => "emails"
document_type => "email"
hosts => "localhost:9200"
    }
  
}

By default, “logstash-input-imap” plugin reads from the folder “INBOX” and it polls the IMAP server every 300 seconds. In the above configuration, I have overridden those settings, as well as the port.

Now, let’s start the logstash agent so that it starts listening to the incoming emails from IMAP server:

$logstash_home>bin/logstash -f email_log.conf --config.reload.automatic

Note: In development mode, it’s good to enable automatic config reloading (--config.reload.automatic), so you don't have to restart logstash every time a change to the pipeline/configuration is made.

Using containers? Check out our Kubernetes Solutions

Apart from enabling elasticsearch as the output plugin for indexing incoming emails to elasticsearch, since I have enabled stdout in the output pipeline, as soon as I receive new emails in the “Inbox” folder, we should start seeing output in the console too.

Below is the console output for the new email:

{
           "date" => "Fri, 24 Mar 2017 22:16:59 -0700",
   "mime-version" => "1.0",
     "x-priority" => "3",
        "subject" => "Testing IMAP",
        "message" => "Logstash is awesome!!\n",
     "@timestamp" => 2017-03-25T05:16:59.000Z,
       "x-mailer" => "Outlook 2.0.1.9.1 (1003210) [OL 15.0.4551.0 (x86)]",
         "sender" => "Qbox User 2 <user2@qbox.com>",
       "@version" => "1",
     "message-id" => "<8655d752-75ec-4c15-a98f-de8bf5b554d7@default>",
           "from" => "Qbox User 2 <user2@qbox.com>",
   "content-type" => "multipart/alternative; boundary=__1490419010695311623abhmp0005.qbox.com",
             "to" => "Qbox User 1 <user1@qbox.com>"
}

Let’s verify if the same event/incoming email got indexed to Elasticsearch, too. Let’s call the Elasticsearch “Search” endpoint API and verify.

C:\>curl -XGET "http://localhost:9200/_search?pretty=true"
{
 "took" : 1,
 "timed_out" : false,
 "_shards" : {
   "total" : 5,
   "successful" : 5,
   "failed" : 0
 },
 "hits" : {
   "total" : 1,
   "max_score" : 1.0,
   "hits" : [
     {
       "_index" : "emails",
       "_type" : "email",
       "_id" : "AVsD5YVtDkLTLNh_lPBb",
       "_score" : 1.0,
       "_source" : {
         "date" : "Fri, 24 Mar 2017 22:16:59 -0700",
         "mime-version" : "1.0",
         "x-priority" : "3",
         "subject" : "Testing IMAP",
         "message" : "Logstash is awesome!!\n",
         "@timestamp" : "2017-03-25T05:16:59.000Z",
         "x-mailer" : "Outlook 2.0.1.9.1 (1003210) [OL 15.0.4551.0 (x86)]",
         "sender" : "Qbox User 2 <user2@qbox.com>",
         "@version" : "1",
         "message-id" : "<8655d752-75ec-4c15-a98f-de8bf5b554d7@default>",
         "from" : "Qbox User 2 <user2@qbox.com>",
         "content-type" : "multipart/alternative; boundary=__1490419010695311623abhmp0005.qbox.com",
         "to" : "Qbox User 1 <user1@qbox.com>"
       }
     }
   ]
 }
}

As seen above, we can see incoming emails are getting indexed to elasticsearch.

Conditional Tagging

Now that we have the basic setup to index emails to elasticsearch, we can add additional fields, filters, and conditionally add tags and more. Say we tag all the emails with the subject having a “critical/error” keyword in it. Depending on the tags, we can take actions in the output plugins like sending an email to support team, raising a jira bug, sending pagerduty event, etc. The possibilities are limitless. 

Below is a sample conf showcasing conditional tagging:

#email_log.conf
input {
imap {
             host => "imap.qbox.com"
             password => "secertpassword"
             user => "user1@qbox.com"
             port => 993
check_interval => 10
folder => "Inbox"
           add_field => {"parser" => "logstash"}
}
}
filter {
 
 if "critical" in [subject]{
mutate { add_tag => "critical" }
 }else if “error” in [subject]{
            mutate { add_tag => "error" }
}
 
}
output {
if [critical] in [tags]{
email {
to=> ”support@qbox.com”
subject => subject
body => message
}
}
stdout { codec => rubydebug }
elasticsearch {
index => "emails"
document_type => "email"
hosts => "localhost:9200"
}
  
}

Conclusion

You can play around like sending a pagerduty event or raising a jira etc. Make sure those plugins are installed if not install those logstash plugins before you proceed further. Now that we have our emails in elasticsearch, we can write a simple client to search from elasticsearch or directly call elasticsearch search endpoints from either curl or REST clients and start mining from our humongous amount of emails. There are plenty other configurations for “logtash-input-imap.” 

Other Helpful Tutorials

Give It a Whirl!

It's easy to spin up a standard hosted Elasticsearch cluster on any of our 47 Rackspace, Softlayer, Amazon, or Microsoft Azure data centers. And you can now provision your own AWS Credits on Qbox Private Hosted Elasticsearch

Questions? Drop us a note, and we'll get you a prompt response.

Not yet enjoying the benefits of a hosted ELK stack enterprise search on Qbox? We invite you to create an account today and discover how easy it is to manage and scale your Elasticsearch environment in our cloud hosting service.

comments powered by Disqus