Filebeat is extremely lightweight compared to its predecessors when it comes to efficiently sending log events. It uses lumberjack protocol, compression, and is easy to configure using a yaml file. It can send events directly to elasticsearch as well as logstash. It keeps track of files and position of its read, so that it can resume where it left of. 

The goal of this tutorial is to set up a proper environment to ship Linux system logs to Elasticsearch with Filebeat. It then shows helpful tips to make good use of the environment in Kibana.

On Filebeat and its Predecessors 

During the initial days of ELK (Elasticsearch, Logstash, Kibana), a single logstash jar file was used for both shipping and aggregating log events to elasticsearch. The same logstash java jar file was used on all servers that needed to ship logs, and the same jar file was used to aggregate it to elasticsearch for indexing later on.

The method used to start the jar file was making it act as “log shipper” or “log server”. Things started changed slowly as developers improved it daily.

The legacy logstash forwarder jar was replaced by “Lumberjack”. Lumberjack was later renamed “logstash forwarder”. The name change from Lumberjack to logstash-forwarder was done to convey the use case clearly rather than having an ambiguous name. 

Logstash-forwarder is written in Go and uses a secure method to ship extremely-compressed log data. Compared to the legacy logstash java jar, logstash-forwarder used very minimal resources to collect and send log events.

Lumberjack is the name of a protocol that was designed to send log events faster compared to TCP. Logstash-forwarder used this protocol. Primary goals behind this protocol development are:

  1. Acknowledgement of messages should happen at the application level.

  2. Network latency should not impact the amount of data sent during log forwarding.

  3. Data should be extremely compressed so that bandwidth utilization remains the minimum.

Recently, Elastic launched a set of tools for shipping logs, called Beats. Filebeat is a log shipping component, and is part of the Beats tool set.

Tutorial

Filebeat introduces many improvements to logstash-forwarder. Filebeat can be installed on a server, and can be configured to send events to either logstash (and from there to elasticsearch), OR even directly to elasticsearch, as shown in the below diagram.


logs-filebeat1.png#asset:1396

To complete the working example, we use:

  1. An Ubuntu 16.04 Linux Machine, which can be a VM/Cloud instance.

  2. An Elasticsearch Cluster with a Kibana Interface using Qbox.

The main advantage of using Qbox over a self-hosted cluster is its flexibility, scalability, free support, 24/7 uptime, and pay as you go model.

For this post, we will be using hosted Elasticsearch on Qbox.io. You can sign up or launch your cluster here, or click "Get Started" in the header navigation. If you need help setting up, refer to “Provisioning a Qbox Elasticsearch Cluster.

FileBeat

Let’s get started by installing Filebeat on our Ubuntu 16.04 host. This node will act as an agent like any other server in any environment that needs to send all logs to a central location.

Filebeat can be installed from RPM, DEB package, and even source. You can download the latest here. We will be installing filebeat using apt. First, we need to add apt key provided by elastic, as shown below.

# wget -O - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -

Traditionally, apt repository URL’s are all http, not https. Recently, apt starting supporting repository urls with SSL/TLS support. For that to work, we need to install another package called apt-transport-https.

# apt-get install apt-transport-https

Now we need to get a .list file to /etc/apt directory, that contains the repository URL to download filebeat package. It can be done by creating a file with the name  /etc/apt/sources.list.d/elastic-5.x.list with the below content.

# cat /etc/apt/sources.list.d/elastic-5.x.list
deb https://artifacts.elastic.co/packages/5.x/apt stable main

Now run apt-get update to update the cache with filebeat packages.

#apt-get update

To install filebeat, fire the below command:

# apt-get install filebeat

Similar to other programs in Linux, the default configuration for filebeat will reside inside /etc/filebeat directory. Let’s see what’s inside that directory.

root@ip-10-12-2-64:/etc/filebeat# ls
filebeat.full.yml            filebeat.template-es6x.json  filebeat.yml
filebeat.template-es2x.json  filebeat.template.json

We can see couple of YAML files, with the extension of yml, and couple of json files, as well.  The default configuration file is filebeat.yml from the directory listing shown above.

The filebeat.yml file comes with several example configurations that are for demonstration purposes, and are by default commented out. Create a backup of this default file and then another filebeat.yml file with several options to understand each of the options.

#mv /etc/filebeat/filebeat.yml /tmp
#touch /etc/filebeat/filebeat.yml

To start, let us try sending our syslog messages(ie: everything inside /var/log directory) directly to elasticsearch using filebeat.  The initial configuration file for filebeat will have the below content:

root@ip-10-12-2-64:~# cat /etc/filebeat/filebeat.yml
filebeat.prospectors:
- input_type: log
  paths:
    - /var/log/*.log
  input_type: log
  document_type: syslog
  registry: /var/lib/filebeat/registry
output.elasticsearch:
  hosts: ["https://ec18487882735hd08009d3:efebd6a1e0@eb835637.qb0x.com:32563"]
  protocol: "https"
  username: "ec18487808b87359124d3"
  password: "wetihnsdt453"

To restart the filebeat service, run the below command.

#service filebeat restart

There are mainly two different sections inside filebeat.yml shown above:

  1. Filebeat prospectors gives details about the log files, and its location, to look for messages/events.
  2. Output tells filebeat the location where these messages should be sent.

In the previously shown example, we grabbed all files inside /var/log directory, that ends with *.log. The type field in logstash is controlled by the document_type option. This helps identify where the logs originate.

Filebeat also supports wildcard entries for log locations under the paths setting. For example, /var/log/*/*.log can also be used, which will cover all subdirectories inside /var/log and any files inside those subdirectories ending with *.log.

You can even specify multiple locations as shown below.

paths:
  - /var/log/app1/*.log
  - /opt/app2/*.log
  - /opt/app3/somedir/*.log

Compared to its predecessors, filebeat can also do the below with much better efficiency:

  • It can manage log rotation. This means if you have an application log rotation configured, filebeat will recognize it and will start reading from new file after rotation automatically. This was buggy in earlier implementations.

  • Filebeat is intelligent and it tracks the location till where it has read a log file. Basically there is a registry file, where filebeat continuously keeps on noting down the read position.  Note the registry setting in our example configuration file above, registry: /var/lib/filebeat/registry.

root@ip-10-12-2-64:/var/log# cat /var/lib/filebeat/registry 
[{"source":"/var/log/auth.log","offset":8626,"FileStateOS":{"inode":50276,"device":51713},"timestamp":"2017-05-24T07:17:03.271000117Z","ttl":-1},{"source":"/var/log/cloud-init-output.log","offset":6097,"FileStateOS":{"inode":50256,"device":51713},"timestamp":"2017-05-24T06:36:28.138568056Z","ttl":-1},{"source":"/var/log/cloud-init.log","offset":205954,"FileS

You can clearly see, from the above output, that the registry file we created actually holds a full set of details like the location its reading, the device the file is stored, the inode number of the file, etc.

This kind of registry file is useful when it comes to restarting of the service. Filebeat will resume sending logs from exactly the place it left off, because the details are in the registry file.

Kibana

Let us look at the Kibana interface to see if the logs we are sending using filebeat are actually being populated.

logs-filebeat-kibana2.png#asset:1397

We need to first provide a pattern of index for kibana to do query against elasticsearch. As we are using filebeat, the default pattern is filebeat-*

Once the pattern/index name is saved, the kibana interface should show you log events on the dashboard as shown below.

logs-filebeat-kibana3.png#asset:1398

You can add “fields” and “tags” to the logs sent to elasticsearch from filebeats configuration. This is helpful for identifying the type of logs, as well as more granular filtering. For example, you can give custom fields to all messages sent by filebeat for a particular prospector. 

Below is an example of such a configuration:

filebeat.prospectors:
- input_type: log
  paths:
    - /var/log/*.log
  input_type: log
  tags: [ "an-example-tag" ]
  fields:
    region: AsiaPacific

logs-filebeat-kibana4.png#asset:1399

Apart from this, filebeat does a certain level of parsing/filtering of standard application logs. There are modules that can be added to filebeat during startup.

By default, filebeat does not include these modules to keep the process lightweight. You can, however, enable it either by editing the INIT script or adding directly to the filebeat configuration file.

DAEMON_ARGS="-c /etc/filebeat/filebeat.yml -path.home /usr/share/filebeat -path.config /etc/filebeat -path.data /var/lib/filebeat -path.logs /var/log/filebeat -e -modules=nginx"

Note the -e -modules=nginx above. You can now restart filebeat service for changes to take effect. Similarly, you can enable modules from the /etc/filebeat/filebeat.yml file. 

modules:
- name: syslog
- name: nginx
- name: auditd
- name: mysql

You can also exclude log messages using filebeat. For example, if a log file has contents that does not add much value for a long term storage, you can write a regex that will prevent it from being sent to elasticsearch, as shown below.

- paths:
    - /opt/tomcat/logs/*.log
  exclude_lines: ['^10-12-2.64']

As mentioned earlier and shown in the diagram, you can also send the log messages to logstash server. You can filter logs, modify things, and then finally logstash can ingest it to elasticsearch. Such a filebeat configuration file is shown below.

filebeat.prospectors:
- input_type: log
  paths:
    - /opt/app1/*.log
    - /opt/app2/*.log
  input_type: log
  document_type: syslog
  registry: /var/lib/filebeat/registry
output.logstash:
  hosts: ["mylogstashurl.example.com:5044"]

Conclusion

Filebeat is extremely lightweight compared to its predecessors when it comes to efficiently sending log events. It uses lumberjack protocol, compression, and is easy to configure using a yaml file. It can send events directly to elasticsearch as well as logstash. It keeps track of files and position of its read, so that it can resume where it left of.

Other Helpful Tutorials

Give It a Whirl!

It's easy to spin up a standard hosted Elasticsearch cluster on any of our 47 Rackspace, Softlayer, or Amazon data centers. And you can now provision your own AWS Credits on Qbox Private Hosted Elasticsearch

Questions? Drop us a note, and we'll get you a prompt response.

Not yet enjoying the benefits of a hosted ELK-stack enterprise search on Qbox? We invite you to create an account today and discover how easy it is to manage and scale your Elasticsearch environment in our cloud hosting service.

comments powered by Disqus