A comprehensive log management and analysis strategy is mission critical, enabling organizations to understand the relationship between operational, security, and change management events and to maintain a comprehensive understanding of their infrastructure. Log files from web servers, applications, and operating systems also provide valuable data, although in different formats, and in a random and distributed fashion.

Logs are a crucial part of any system because they give you insight into what a system is doing as well what happened. Virtually every process running on a system generates logs in some form or another. These logs are usually written to files on local disks. When your system grows to multiple hosts, managing the logs and accessing them can get complicated. 

Searching for a particular error across hundreds of log files on hundreds of servers is difficult without good tools. A common approach to this problem is to set up a centralized logging solution so that multiple logs can be aggregated in a central location. To effectively consolidate, manage, and analyze these different logs, many customers choose to implement centralized logging solutions using Elasticsearch, Logstash, and Kibana, popularly known as ELK Stack.

For this post, we will be using hosted Elasticsearch on Qbox.io. You can sign up or launch your cluster here, or click "Get Started" in the header navigation. If you need help setting up, refer to "Provisioning a Qbox Elasticsearch Cluster."

Provisioning an Elasticsearch cluster in Qbox is easy. In this article, we walk you through the initial steps and show you how simple it is to start and configure your cluster. We then install and configure logstash to ship our syslogs to elasticsearch. Syslogs shipped to elasticsearch can then be visualized and analyzed via Kibana dashboards.

Our Goal

The goal of the tutorial is to use Qbox as a Centralized Logging and Monitoring solution. Qbox provides out of box solution for Elasticsearch, Kibana and many of Elasticsearch analysis and monitoring plugins. We will set up Logstash in a separate node or machine to gather syslogs from single or multiple servers, and use Qbox’s provisioned Kibana to visualize the gathered logs.

Our ELK stack setup has three main components:

  • Elasticsearch: It is used to store all of the application and monitoring logs (Provisioned by Qbox).

  • Logstash: The server component that processes incoming logs and feeds to ES.

  • Kibana: A web interface for searching and visualizing logs (Provisioned by Qbox).

Prerequisites

The amount of CPU, RAM, and storage that your Elasticsearch server will require depends on the volume of logs that you intend to gather. For this tutorial, we will be using a Qbox-provisioned Elasticsearch with the following minimum specs:

  • Provider: AWS

  • Version: 2.3.4

  • RAM: 1 GB

  • CPU: vCPU1

  • Replicas: 0

The above specs can be changed per your desired requirements. Please select the appropriate names, versions, regions for your needs. For this example, we used Elasticsearch version 2.3.4. The most current version is 5.3. We support all versions of Elasticsearch on Qbox. (To learn more about the major differences between 2.x and 5.x, click here.) 

In addition to our Elasticsearch server, we will require a separate Logstash server to process incoming syslogs from client servers and ship them to Elasticsearch. There can be a single or multiple client servers for which you ship logs to Elasticsearch. For simplicity or testing purposes, the Logstash server can also act as the client server itself. The Endpoint and Transport addresses for our Qbox-provisioned Elasticsearch cluster are as follows:

cluster_config.png

Note: Please make sure to whitelist the Logstash server IP from the Qbox Elasticsearch cluster. Also, the Elasticsearch server must have access to all client servers to collect syslogs.

Install Logstash

Download and install the Public Signing Key:

wget -qO - https://packages.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -

We will use the Logstash version 2.3.x as compatible with our Elasticsearch version 2.3.4. The Elastic Community Product Support Matrix can be referenced to clear any version issues.

Add the repository definition to your /etc/apt/sources.list file:

echo "deb https://packages.elastic.co/logstash/2.3/debian stable main" | sudo tee -a /etc/apt/sources.list

Run sudo apt-get update and the repository is ready for use. You can install it with:

sudo apt-get update && sudo apt-get install logstash

Alternatively, Logstash tar can also be downloaded from Elastic Product Releases Site. The steps of setting up and running Logstash are then quite simple:

  • Download and unzip Logstash

  • Prepare a logstash.confconfig file

  • Run bin/logstash -f logstash.conf -t to check config (logstash.conf)

  • Run bin/logstash -f logstash.conf

Configure Logstash

Logstash configuration files are in the JSON-format and reside in /etc/logstash/conf.d. The configuration consists of three sections: inputs, filters, and outputs.

Let's create a configuration file called 02-syslog-input.conf and set up our "syslog" input:

sudo vi /etc/logstash/conf.d/02-syslog-input.conf

Insert the following input configuration:

input {
  file {
    path => ["/var/log/syslog"]
    type => "syslog"
  }
}

NOTE : Syslog file path may differ based upon your environment, configuration and underlying OS.

Save and quit. This specifies a syslog input that will listen on tcp port 5044. Now let's create a configuration file called 10-syslog-filter.conf, where we will add a filter for syslog messages:

sudo vi /etc/logstash/conf.d/10-syslog-filter.conf

Insert the following syslog filter configuration:

filter {
  if [type] == "syslog" {
    grok {
      match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA:syslog_message}" }
      add_field => [ "received_at", "%{@timestamp}" ]
      add_field => [ "received_from", "%{host}" ]
    }
    syslog_pri { }
    date {
      match => [ "syslog_timestamp", "MMM  d HH:mm:ss", "MMM dd HH:mm:ss" ]
    }
  }
}

Save and quit. This filter looks for logs that are labeled as "syslog" type, and it will try to use grok to parse incoming syslog logs to make it structured and queryable.

Lastly, we will create a configuration file called 30-elasticsearch-output.conf:

sudo vi /etc/logstash/conf.d/30-elasticsearch-output.conf

Insert the following output configuration:

output {
  elasticsearch { 
      hosts => ["https://eb843037.qb0x.com:30024/"]
      user => "5d53675f1e0dd8be3ada"
      password => "3b193023f7"
      index => "syslog-%{+YYYY.MM.dd}"
      document_type => "system_logs"
  }
  stdout { codec => rubydebug }
}

Save and exit. This output basically configures Logstash to store the logs data in Elasticsearch, which is running at https://eb843037.qb0x.com:30024/, in an index named after the syslog.

If you have downloaded logstash tar or zip, you can create a logstash.conf file having input, filter, and output all in one place.

sudo vi LOGSTASH_HOME/logstash.conf

Insert the following input, filter and output configuration in logstash.conf:

input {
  file {
    path => ["/var/log/syslog"]
    type => "syslog"
  }
}
filter {
  if [type] == "syslog" {
    grok {
      match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA:syslog_message}" }
      add_field => [ "received_at", "%{@timestamp}" ]
      add_field => [ "received_from", "%{host}" ]
    }
    syslog_pri { }
    date {
      match => [ "syslog_timestamp", "MMM  d HH:mm:ss", "MMM dd HH:mm:ss" ]
    }
  }
}
output {
  elasticsearch { 
    hosts => ["https://eb843037.qb0x.com:30024/"]
    user => "5d53675f1e0dd8be3ada"
    password => "3b193023f7"
    index => "syslog-%{+YYYY.MM.dd}"
    document_type => "system_logs"
  }
  stdout { codec => rubydebug }
}

If you want to add filters for other applications that use the syslog input, be sure to name the files so they sort between the input and the output configuration (i.e., between 02- and 30-).

Test your Logstash configuration with this command:

sudo service logstash configtest

It should display Configuration OK if there are no syntax errors. Otherwise, try and read the error output to see what's wrong with your Logstash configuration.

Restart Logstash, and enable it, to put our configuration changes into effect:

sudo service logstash restart
sudo update-rc.d logstash defaults 96 9

If you have downloaded logstash tar or zip, it can be run using following command

bin/logstash -f logstash.conf

Next, we'll load the sample Kibana dashboards.

Load Kibana Dashboards

When you have finished setting the Logstash server to collect logs from client servers, let's look at Kibana, the web interface provisioned by Qbox. Kibana user interface can be used for filtering, sorting, discovering, and visualizing logs that are stored in Elasticsearch. Go ahead and click on Visualize data with Kibana from your cluster configuration dashboard.

kibana_home.png

Now select [syslog]-YYY.MM.DD from the Index Patterns menu (left side), then click the Star (Set as default index) button to set the syslog index as the default.

Now click the Discover link in the top navigation bar. By default, this will show you all of the log data over the last 15 minutes. You should see a histogram with log events, with log messages below:

3.png

Right now, there won't be much in there because you are only gathering syslogs from your client servers. Here, you can search and browse through your logs. You can also customize your dashboard.

Try the following things:

  • Search for "admin" to see if anyone is trying to log into your servers as admin
  • Search for a particular hostname (search for host: "adam@qbox")

  • Change the time frame by selecting an area on the histogram or from the menu above
  • Create visualizations and populate them in specific dashboards.

kibana.gif

Kibana has many other features, such as graphing and filtering, so feel free to explore!

Conclusion

Monitoring many services on a single server poses some difficulties. Monitoring many services on many servers requires a whole new way of thinking and a new set of tools. As you start embracing microservices, containers, and clusters, the number of deployed containers will begin increasing rapidly. The same holds true for servers that form the cluster. We cannot log into a node and look at logs anymore. Centralized logging can be very useful when attempting to identify problems with your servers or applications because it allows you to search through all of your logs in a single place.

Qbox-provisioned Elasticsearch makes it very easy for us to visualize centralized logs using logstash and Kibana. Remember that we can send essentially any type of log or indexed data to Logstash, but the data becomes even more useful if it is parsed and structured with grok.

What do we look for in centralized logging? As it happens, many things, but the most important are as follows:

  • A way to parse data and send them to a central database in near real time.
  • The capacity of the database to handle near real-time data querying and analytics.
  • A visual representation of the data through filtered tables, dashboards, etc.

The ELK stack (Logstash, Elasticsearch, and Kibana) can do all that, and it can easily be extended to satisfy our particular needs.

Give It a Whirl!

It's easy to spin up a standard hosted Elasticsearch cluster on any of our 47 Rackspace, Softlayer, Amazon, or Microsoft Azure data centers. And you can now provision your own AWS Credits on Qbox Private Hosted Elasticsearch

Questions? Drop us a note, and we'll get you a prompt response.

Not yet enjoying the benefits of a hosted ELK stack enterprise search on Qbox? We invite you to create an account today and discover how easy it is to manage and scale your Elasticsearch environment in our cloud hosting service.

comments powered by Disqus