Parsing Logs Using Logstash

Posted by Vineeth Mohan March 17, 2016

In this tutorial series we are going to utilize the ELK (Elasticsearch-Logstash-Kibana) stack to parse, index, visualize, and analyze logs. Nearly all the processes in a server or in an application are logged into a log file. These log files act as a critical source in helping us to accomplish numerous things, ranging from troubleshooting to anomaly detection by effectively analyzing these logs.

For analyzing the logs, one should parse it into smaller components with appropriate fields and values. Then, index the components in a database and conduct the required analysis. One of the most reliable and scalable stack for these purposes is the ELK stack. Here we have the logs parsed and split into proper individual documents by Logstash. These documents then get indexed into the powerful text analytic engine, Elasticsearch, and lastly, are passed into the visualization tool Kibana.

In this edition of the ELK blog series we are going to see the setup, configuration, and a basic example of how to parse and index logs using Logstash.

For this post, we will be using hosted Elasticsearch on Qbox.io. You can sign up or launch your cluster here, or click "Get Started" in the header navigation. If you need help setting up, refer to "Provisioning a Qbox Elasticsearch Cluster."

Logstash Installation

Logstash can be installed in a number of ways. The easiest way is to do the installation is from a zip file. Run the following commands in the terminal and you are done with the installation.

$ wget https://download.elastic.co/logstash/logstash/logstash-1.5.5.zip
$ unzip logstash-1.5.5.zip

After the above steps you can see a folder named logstash-1.5.5 where we need to do some configuration in order to make it run.

Logstash Configuration

Inside the logstash-1.5.5 folder create a configuration file named logstash.conf. A typical logstash configuration file has 3 parts as shown in the figure below:

sKhJHYPoNlq4gc8IGx4e8tkHY6kLcYiGPu_VQvkl
Now let us see what each section does in detail and how to make them functional.

1. Input

Inputs are the methods by which we can pass the logs to elasticsearch. It can be in numerous ways like mentioning the folder path, or making logstash listen to specific ports, etc. In our example we are following the former input method of mentioning the folder path. Here is the code on how we do it:

input {
   file {
       path => "/home/user/nginxAccess.log"
       start_position => "beginning"
      type => "logs"
   }
}

In the above code you can see I have mentioned the path and the start position as beginning to process the logs from the beginning of the file. We have the option to make it from other positions too. Lastly the type parameter indicates the type name in which the logs are going to be saved in elasticsearch.

2. Filter

Filters are the place where we define what sort of processing should be done on the inputs in logstash. Here we can write rules and conditions to do the processing. Three major filters we use in this tutorial are the grok, date, and geoip filters. Grok filters parse the arbitrary text data, structures it, and extracts the fields specified by us. The geoip filter adds information about the geographical data from the IP addresses parsed from the input logs. Following is the filter we are going to use in this example:

filter {
grok{
match=>{
"message"=>"%{IP:clientip} \- \- \[%{NOTSPACE:date} \-%{INT}\] \"%{WORD:action} /%{WORD}/%{WORD}/%{NOTSPACE:login} %{WORD:protocol}/%{NUMBER:protocolNum}\" %{NUMBER:status} %{NUMBER} \"%{NOTSPACE}\" \"%{NOTSPACE:client} \(%{WORD}; %{WORD:clientOs}%{GREEDYDATA}"
}
add_field=>{
"eventName"=>"grok"
}
}
geoip {
source => "clientip"
}
}

In the above example you can see that we have used grok rules in the filter section. Below is how to effectively write grok rules. Consider the below log as the sample case:

"122.164.121.231 - - [23/Nov/2015:07:19:54 -0500] "GET /configs/config/thecelloserenades@gmail.com HTTP/1.1" 200 1135 "http://localhost/octoviz-auth/public/octoviz/admin/admin.html" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:39.0) Gecko/20100101 Firefox/39.0""

We need to get specific fields from the above nginx log. Say we need the IP field to be taken. As you can see in the grok pattern we have written %{IP:clientip} for that to be done. Let us explore in detail using the figure below:


juuCp2PZrU0ESQqo7H5Zo6vL1aLwMDvQXHcSnLnP
The above figure only explains the extraction of the IP field. Using the above method, and with the knowledge of appropriate grok patterns, (which can be found here), we can extract the required fields from the log files with great flexibility. Now, if you want to check if the given grok pattern is working prior to indexing, you can go to this website and paste your sample log and then write the grok rule just below it as show in the figure.


8wWFdTNPELSJeFgX1irvgjmuZpTdCu9Dl1UBxj9X

Lastly, in the filters section we have the geoip filter where we specify the source where the IP should be taken. The geoip filter, as mentioned earlier, would add geographical information as per the IP inputted.

3. Output

Output is the last phase in the logstash queue. Here we can specify where the parsed logs should go, whether it should get indexed to Elasticsearch or if it should be stored in a file, in a server, etc. Here, we are indexing it in our local elasticsearch server for which the configuration is below:

output {
elasticsearch {
protocol => "http"
host => "QBOX_ES_IP:ES_PORT"
index => "logstash-test-01"
}
}

So putting it all together, our logstash.conf would look like this:


IJIeOSurqT7oF6I_RPd6RU3tOAARwMBk1t6Z44AJ

Indexing to Elasticsearch

Now it is time to index the parsed data into elasticsearch. Since we are using the Qbox elasticsearch server QBOX_ES_IP:ES_PORT for our data indexing (you can read about how to create a cluster and host an elasticsearch server in Qbox from my previous blog here), all we need to do is to initiate logstash, which can be done as follows:

$ bin/logstash -f logstash.conf

This will start parsing the data and indexing the data to the Qbox hosted elasticsearch server. After running this command there will be the following message displayed in the terminal: Logstash startup completed. The execution of logstash is a continuous process and it looks whether the input file is growing.  That is whether new logs are added to it and if there are new additions, it parses the newly added logs too.

Now you can verify whether the data is indexed in the logs by querying in the command line to the index logstash-test-01, like below:

curl -XGET 'http://52.90.123.188:80/logstash-test-01/_search?pretty=1' -d '{}'

Logstash helps us to define the mapping too. If the index name starts with logstash, it will assume that the inputs are logs and applies a predefined set of optimized mappings for the indexed documents. This comes as a great help because it bypasses the human involvement in defining the mapping and saves us much time.

Conclusion

In this article we have seen how to parse the nginx access logs using filters in logstash and how to index them to elasticsearch. We have also seen the usage of grok filters and log parsing techniques in detail. In the next installment of this series we are going to see how this indexed data can be visualized using Kibana. We have done a Kibana 3 series tutorial series before, however here we use Kibana 4.3 as it has plenty of new features to be explored.