A comprehensive log management and analysis strategy is mission critical, enabling organisations to understand the relationship between operational, security, and change management events and maintain a comprehensive understanding of their infrastructure. Log files from web servers, applications, and operating systems also provide valuable data, though in different formats, and in a random and distributed fashion.

Why is Apache Web Server so popular? It’s free and open source, and open source is becoming vastly more popular than proprietary software. It’s maintained by dedicated developers, it provides security, is well suited for small and large websites alike, can be easily set up on all major operating systems, as well as being extremely powerful and flexible. Does that sound about right?

Provisioning an Elasticsearch cluster in Qbox is easy. In this article, we walk you through the initial steps and show you how simple it is to start and configure your cluster. We then install and configure logstash to ship our apache logs to elasticsearch. Apache logs shipped to elasticsearch can then be visualized and analyzed via Kibana dashboards.

Our Goal

The goal of the tutorial is to use Qbox as a Centralized Logging and Monitoring solution for Apache logs. Qbox provides out-of-box solutions for Elasticsearch, Kibana and many of Elasticsearch analysis and monitoring plugins. We will set up Logstash in a separate node to gather apache logs from single or multiple servers, and use Qbox’s provisioned Kibana to visualize the gathered logs.

Our ELK stack setup has three main components:

  • Elasticsearch: It is used to store all of the application and monitoring logs(Provisioned by Qbox).

  • Logstash: The server component that processes incoming logs and feeds to ES.

  • Kibana: A web interface for searching and visualizing logs (Provisioned by Qbox).

For this post, we will be using hosted Elasticsearch on Qbox.io. You can sign up or launch your cluster here, or click "Get Started" in the header navigation. If you need help setting up, refer to "Provisioning a Qbox Elasticsearch Cluster."

Overview

Let’s review how the ELK pipeline will work:

  1. We issue log-generating requests to the running apache web server.

  2. Logstash, running as a persistent daemon, monitors the Apache logs for new lines and processes them.

  3. Logstash will send parsed logs in JSON document form to Elasticsearch for storage and the ability to perform analytics on them.

  4. Kibana uses Elasticsearch as a back-end for dashboarding and searching.

Prerequisites

The amount of CPU, RAM, and storage that your Elasticsearch Server will require depends on the volume of logs that you intend to gather. For this tutorial, we will be using a Qbox provisioned Elasticsearch with the following minimum specs:

  • Provider: AWS

  • Version: 5.1.1

  • RAM: 1 GB

  • CPU: vCPU1

  • Replicas: 0

The above specs can be changed per your desired requirements. Please select the appropriate names, versions, regions for your needs. For this example, we used Elasticsearch version 5.1.1, the most current version is 5.3. We support all versions of Elasticsearch on Qbox. (To learn more about the major differences between 2.x and 5.x, click here.)  

In addition to our Elasticsearch Server, we will require a separate logstash server to process incoming apache logs from client servers and ship them to Elasticsearch. There can be a single or multiple client servers for which you wish to ship logs to Elasticsearch. For simplicity or testing purposes, the logstash server can also act as the client server itself. The Endpoint and Transport addresses for our Qbox provisioned Elasticsearch cluster are as follows:

common_1.png

Endpoint: REST API

https://ec18487808b6908009d3:efcec6a1e0@eb843037.qb0x.com:32563

Authentication

  • Username = ec18487808b6908009d3

  • Password = efcec6a1e0

TRANSPORT (NATIVE JAVA)

eb843037.qb0x.com:30543

Note: Please make sure to whitelist the logstash server IP from Qbox Elasticsearch cluster. Also, the elasticsearch server must have access to all client servers to collect apache logs from.

Install Apache Web Server

Apache is a free open source software which runs over 50% of the world’s web servers.

To install apache, open terminal and type in these commands:

sudo apt-get update
sudo apt-get install apache2

That’s it. To check if Apache is installed, direct your browser to your server’s IP address (eg. http://12.34.56.789). The page should display the words “It works!".

How to find your server’s IP address

You can run the following command to reveal your server’s IP address.

ifconfig eth0 | grep inet | awk '{ print $2 }'

The Apache File Hierarchy in Ubuntu and Debian

On Ubuntu and Debian, Apache keeps its main configuration files within the "/etc/apache2" folder:

cd /etc/apache2
ls -F
apache2.conf  envvars     magic            mods-enabled/  sites-available/
conf.d/       httpd.conf  mods-available/  ports.conf     sites-enabled/

There are a number of plain text files and some sub-directories in this directory. These are some of the more useful locations to be familiar with:

  • apache2.conf: This is the main configuration file for the server. Almost all configuration can be done from within this file, although it is recommended to use separate, designated files for simplicity.

  • ports.conf: This file is used to specify the ports that virtual hosts should listen on. Be sure to check that this file is correct if you are configuring SSL.

  • conf.d/: This directory is used for controlling specific aspects of the Apache configuration. For example, it is often used to define SSL configuration and default security choices.

  • sites-available/: This directory contains all of the virtual host files that define different web sites. These will establish which content gets served for which requests. These are available configurations, not active configurations.

  • sites-enabled/: This directory establishes which virtual host definitions are actually being used. Usually, this directory consists of symbolic links to files defined in the "sites-available" directory.

  • mods-[enabled,available]/: These directories are similar in function to the sites directories, but they define modules that can be optionally loaded instead.

Install Logstash

Download and install the Public Signing Key:

wget -qO - https://packages.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -

We will use the Logstash version 2.4.x as compatible with our Elasticsearch version 5.1.x. The Elastic Community Product Support Matrix can be referred in order to clear any version issues.

Add the repository definition to your /etc/apt/sources.list file:

echo "deb https://packages.elastic.co/logstash/2.4/debian stable main" | sudo tee -a /etc/apt/sources.list

Run sudo apt-get update and the repository is ready for use. You can install it with:

sudo apt-get update && sudo apt-get install logstash

Alternatively, logstash tar can also be downloaded from Elastic Product Releases Site. Then, the steps of setting up and running logstash are pretty simple:

  • Download and unzip Logstash

  • Prepare a logstash.conf config file

  • Run bin/logstash -f logstash.conf -t to check config (logstash.conf)

  • Run bin/logstash -f logstash.conf

Configure Logstash

Logstash configuration files are in the JSON-format, and reside in /etc/logstash/conf.d. The configuration consists of three sections: inputs, filters, and outputs.

Let's create a configuration file called 02-apache-input.conf and set up our "apache" input:

sudo vi /etc/logstash/conf.d/02-apache-input.conf

Insert the following input configuration:

input {
 file {
   path => ["/var/log/apache2/access.log"]
   type => "apache_access"
 }
 file {
   path => ["/var/log/apache2/error.log"]
   type => "apache_error"
 }
}

NOTE : Apache logs file path may differ based upon your environment and underlying OS.

Save and quit. This specifies a apache input that will listen on tcp port 5044. Now let's create a configuration file called 10-apache-filter.conf, where we will add a filter for apache messages:

sudo vi /etc/logstash/conf.d/10-apache-filter.conf

Insert the following apache filter configuration:

filter {
   if [type] in [ "apache" , "apache_access" , "apache-access" ]  {
      grok {
         match => [
         "message" , "%{COMBINEDAPACHELOG}+%{GREEDYDATA:extra_fields}",
         "message" , "%{COMMONAPACHELOG}+%{GREEDYDATA:extra_fields}"
         ]
         overwrite => [ "message" ]
      }
      mutate {
         convert => ["response", "integer"]
         convert => ["bytes", "integer"]
         convert => ["responsetime", "float"]
      }
      geoip {
         source => "clientip"
         target => "geoip"
         add_tag => [ "apache-geoip" ]
      }
      date {
         match => [ "timestamp" , "dd/MMM/YYYY:HH:mm:ss Z" ]
         remove_field => [ "timestamp" ]
      }
      useragent {
         source => "agent"
      }
   }
   if [type] in ["apache_error","apache-error"] {
      grok {
         match => ["message", "\[%{WORD:dayname} %{WORD:month} %{DATA:day} %{DATA:hour}:%{DATA:minute}:%{DATA:second} %{YEAR:year}\] \[%{NOTSPACE:loglevel}\] (?:\[client %{IPORHOST:clientip}\] ){0,1}%{GREEDYDATA:message}"]
         overwrite => [ "message" ]
      }
      mutate
      {
         add_field =>
         {
            "time_stamp" => "%{day}/%{month}/%{year}:%{hour}:%{minute}:%{second}"
         }
      }
      date {
         match => ["time_stamp", "dd/MMM/YYYY:HH:mm:ss"]
         remove_field => [ "time_stamp","day","dayname","month","hour","minute","second","year"]
      }
   }
}

Save and quit. This filter looks for logs that are labeled as "apache" type and it will try to use grok to parse incoming apache logs to make it structured and queryable.

Lastly, we will create a configuration file called 30-elasticsearch-output.conf:

sudo vi /etc/logstash/conf.d/30-elasticsearch-output.conf

Insert the following output configuration:

output {
 elasticsearch {
   hosts => ["https://eb843037.qb0x.com:32563/"]
   user => "ec18487808b6908009d3"
   password => "efcec6a1e0"
   index => "apache-%{+YYYY.MM.dd}"
   document_type => "apache_logs"
 }
 stdout { codec => rubydebug }
}

Save and exit. This output basically configures Logstash to store the logs data in Elasticsearch which is running at https://eb843037.qb0x.com:32563/, in an index named after the apache.

If you have downloaded logstash tar or zip, you can create a logstash.conf file having input, filter and output all in one place.

sudo vi LOGSTASH_HOME/logstash.conf

Insert the following input, filter and output configuration in logstash.conf:

input {
  file {
    path => ["/var/log/apache2/access.log"]
    type => "apache_access"
  }
  file {
    path => ["/var/log/apache2/error.log"]
    type => "apache_error"
  }
}
filter {
   if [type] in [ "apache" , "apache_access" , "apache-access" ]  {
      grok {
         match => [
         "message" , "%{COMBINEDAPACHELOG}+%{GREEDYDATA:extra_fields}",
         "message" , "%{COMMONAPACHELOG}+%{GREEDYDATA:extra_fields}"
         ]
         overwrite => [ "message" ]
      }
      mutate {
         convert => ["response", "integer"]
         convert => ["bytes", "integer"]
         convert => ["responsetime", "float"]
      }
      geoip {
         source => "clientip"
         target => "geoip"
         add_tag => [ "apache-geoip" ]
      }
      date {
         match => [ "timestamp" , "dd/MMM/YYYY:HH:mm:ss Z" ]
         remove_field => [ "timestamp" ]
      }
      useragent {
         source => "agent"
      }
   }
   if [type] in ["apache_error","apache-error"] {
      grok {
         match => ["message", "\[%{WORD:dayname} %{WORD:month} %{DATA:day} %{DATA:hour}:%{DATA:minute}:%{DATA:second} %{YEAR:year}\] \[%{NOTSPACE:loglevel}\] (?:\[client %{IPORHOST:clientip}\] ){0,1}%{GREEDYDATA:message}"]
         overwrite => [ "message" ]
      }
      mutate
      {
         add_field =>
         {
            "time_stamp" => "%{day}/%{month}/%{year}:%{hour}:%{minute}:%{second}"
         }
      }
      date {
         match => ["time_stamp", "dd/MMM/YYYY:HH:mm:ss"]
         remove_field => [ "time_stamp","day","dayname","month","hour","minute","second","year"]
      }
   }
}
output {
 elasticsearch {
   hosts => ["https://eb843037.qb0x.com:32563/"]
   user => "ec18487808b6908009d3"
   password => "efcec6a1e0"
   index => "apache-%{+YYYY.MM.dd}"
   document_type => "system_logs"
 }
 stdout { codec => rubydebug }
}

If you want to add filters for other applications that use the apache input, be sure to name the files so they sort between the input and the output configuration (i.e. between 02- and 30-).

Test your Logstash configuration with this command:

sudo service logstash configtest

It should display Configuration OK if there are no syntax errors. Otherwise, try and read the error output to see what's wrong with your Logstash configuration.

Restart Logstash, and enable it, to put our configuration changes into effect:

sudo service logstash restart
sudo update-rc.d logstash defaults 96 9

If you have downloaded logstash tar or zip, it can be run using following command

bin/logstash -f logstash.conf

Next, we'll load the sample Kibana dashboards.

Load Kibana Dashboards

Screen Shot 2017-04-15 at 11.52.53 AM.png

When you are finished setting Logstash server to collect logs from client servers, let's look at Kibana, the web interface provisioned by Qbox. Kibana User interface can be used for filtering, sorting, discovering and visualizing logs that are stored in Elasticsearch. Go ahead and click on Visualize data with Kibana from your cluster configuration dashboard.

Screen Shot 2017-04-15 at 11.53.11 AM.png

Go ahead and select [apache]-YYY.MM.DD from the Index Patterns menu (left side), then click the Star (Set as default index) button to set the apache index as the default.

Now click the Discover link in the top navigation bar. By default, this will show you all of the log data over the last 15 minutes. You should see a histogram with log events, with log messages below:

Screen Shot 2017-04-15 at 11.53.33 AM.png

Right now, there won't be much in there because you are only gathering apache logs from your client servers. Here, you can search and browse through your logs. You can also customize your dashboard.

Try the following things:

Screen Shot 2017-04-15 at 11.55.55 AM.png

  • Calculate the average or sum of total bytes sent over the web server over a period of time.

  • Aggregate (rank occurrences) of the top response code sent by your webserver (i.e., 200, 404, etc.).

  • Perform fast free text search across all your logs using Elasticsearch.

a4_apache_1.gif

Kibana has many other features, such as graphing and filtering, so feel free to poke around!

a4_apache_2.gif

Conclusion

Qbox provisioned Elasticsearch makes it very easy for us to visualize centralized logs using logstash and Kibana. Remember that we can send pretty much any type of log or indexed data to Logstash, but the data becomes even more useful if it is parsed and structured with grok.

What do we look for in centralized logging? As it happens, many things, but the most important are as follows.

  • A way to parse data and send them to a central database in near real-time.

  • The capacity of the database to handle near real-time data querying and analytics.

  • A visual representation of the data through filtered tables, dashboards, and so on.

Log analysis for operational intelligence, business intelligence and technical SEO are just three examples of why Apache users need to monitor logs. There are many more use cases, such as log driven development and application monitoring. The ELK stack (Logstash, Elasticsearch, and Kibana) can do all that and it can easily be extended to satisfy the particular needs we’ll set in front of us.

Other Helpful Tutorials

Give It a Whirl!

It's easy to spin up a standard hosted Elasticsearch cluster on any of our 47 Rackspace, Softlayer, Amazon, or Microsoft Azure data centers. And you can now provision your own AWS Credits on Qbox Private Hosted Elasticsearch

Questions? Drop us a note, and we'll get you a prompt response.

Not yet enjoying the benefits of a hosted ELK stack enterprise search on Qbox? We invite you to create an account today and discover how easy it is to manage and scale your Elasticsearch environment in our cloud hosting service.

comments powered by Disqus