Logstash ships with many input, codec, filter, and output plugins that can be used to retrieve, transform, filter, and send logs and events from various applications, servers, and network channels. 

In the previous tutorials, we discussed how to use Logstash to ship Redis logsindex emails using Logstash IMAP input plugin, and many other use cases. 

In this article, we continue our journey into the rich world of Logstash input plugins focusing on the Beats family (e.g., Filebeat and Metricbeat), various file and system input plugins, network, email, and chat protocols, cloud platforms, web applications, and message brokers/platforms. Logstash currently supports over 50 input plugin -- and more are coming -- so covering all of them in one article is not possible. Therefore, we decided to overview some of the most popular input plugin categories to give you a general picture of what you can do with Logstash. 

What Are Logstash Input Plugins?

As you remember from our previous tutorials, Logstash works as a logging pipeline that listens for events from the configured logging sources (e.g., apps, databases, message brokers), transforms and formats them using filters and codecs, and ships to the output location (e.g., Elasticsearch or Kafka) (see the image below).

Logstash pipeline


Logstash is so powerful because it can aggregate logs from multiple sources (like Redis, Apache HTTP, or Apache Kafka) that are sitting on multiple nodes and put them in the efficient log processing queue managed by multiple workers and threads. Logstash optimizes log streaming between the input and output destinations, ensuring fault-tolerant performance and data integrity. One of the best advantages of Logstash is the availability of numerous filters and codecs that can extract patterns from logs and transform logs into rich data objects suitable for the analysis in Elasticsearch and Kibana. These features enable blazing fast transformation of raw logs into actionable insights that benefit your business.

Input plugins are the important components of the Logstash pipeline that work as middleware between input log sources and Logstash filtering functionality. In general, each input plugin allows connecting to a specified log source provider and ingest logs using its API. In Logstash, input plugins can be installed and managed by the plugin manager located at bin/logstash-plugin. However, some of the most popular input plugins are installed out of the box. You can see the list of installed plugins in your Logstash by running:

bin/logstash-plugin list

Beats Family

The Elastic Beats family includes a number of shippers for all kinds of data (logs, metrics, events, etc.). In previous tutorials, we extensively covered Filebeat for log shipping, Packetbeat for network data shipping, and Metricbeat for system and application metrics monitoring. Elasticsearch also offers Winlogbeat for Windows Event Logs, Auditbeat for audit data, and Heartbeat for uptime monitoring. Some of these shippers such as Filebeat allow shipping logs to your Elasticsearch indexes directly. 

The natural question arises: why then use Logstash at all? 

 There are two main reasons:

  1. Log aggregation. If you have multiple nodes and many application instances producing logs, you'll need a centralized logging destination for the log management. The basic motivation behind log aggregation is that you want to have logs in one place for better processing. You also want to control the number of indexing connections to Elasticsearch. If there are too many connections, your Elasticsearch cluster might experience timeouts, high bulk queues, and bad responsiveness, and all of these have an adverse effect on performance. To avoid this, you can use Logstash to aggregate logs collected by Beats and use efficient Logstash worker threads to create efficient log queues and batches, which can dramatically reduce the number of indexing operations, and, hence, the load on Elasticsearch.
  2. Logstash is great for log filtering and log enhancement. Log filtering is useful in production environments where you want to exclude certain log data to save storage and make your analytics more focused. Also, much of log data is shipped as unstructured plain text hard to process. To prepare this raw data for analysis and aggregation in Elasticsearch, you'll need to transform log messages into structured data objects with human-readable fields that map to particular types, like strings, dates, integers etc. Once your log data is transformed, you can apply various metrics and aggregations to produce valuable visualizations in Kibana or discover data patterns via your Big Data analytics tools or ML algorithms.

Using Beats components in Logstash is very simple:

input {
  beats {
    port => 5044
  }
}
output {
    elasticsearch {
        hosts => ["IP Address 1:port1", "IP Address 2:port2", "IP Address 3"]
    }
}

For example, the input configuration above tells Logstash to listen to Beats events on 5044 port and ship them directly to Elasticsearch.

File and Exec Input Plugins

Logstash is great for shipping logs from files, bash commands, syslogs, and other common sources of logs in your OS. We'll discuss just two input plugins in this category: Exec input plugin and File input plugin. Other available solutions include Pipe input plugin for reading events from the long-running command pipe and Syslog plugin for reading syslog messages and events over the network.

Let's start with the Exec input plugin. The plugin allows you to periodically run a bash command in your system and ship its output to Logstash. This functionality may be useful for monitoring the state of your system and visualizing it in Kibana. For example, if you are interested in the top processes currently running on the host, you can tell Logstash to run the Linux top command displaying current running processes every 30 seconds.

input {
  exec {
    command => "top"
    interval => 30
  }
}

You can use any command supported by your system in a similar way. 

Another plugin in this group, File input plugin, allows streaming events from files by tailing them in a way similar to tail -0F command on Unix-like systems. In essence, this plugin works as a file watcher that treats a new line in the end of the file as a new event. This feature makes the plugin useful for tracking changing log files as new lines are appended. As an added benefit, the plugin stores current position in each file it tracks so it can start from where it left off when Logstash is stopped and restarted.

input {
  file {
    path => "/var/log/*.log"
  }
}

The plugin supports globe patterns as in the example above. This input configuration tells the plugin to watch all files with the .log extension in the /var/log/ folder.

Input Plugins for Tracking Network Events, Chat, and Email Servers

Logstash provides an excellent support for events and logs generated by various network, Inter-Process Communication (IPC), chat, and email protocols. Let's start with the common network and IPC protocols. Logstash supports UDP, Unix Domain Sockets, Websockets, HTTP, and more.

UDP Plugin

This plugin allows reading messages as events over the network via UDP. The only required configuration field for the plugin is port, which specifies the UDP port for Logstash to listen for events.

input {
    udp {
    port => 25000
    workers => 4
        codec => json
    }
}

As in the example above, you can optionally use a JSON codec to transform UDP messages into JSON objects for better processing in Elasticsearch. You can also control the size of message queue with the queue_size parameter and specify the number of worker threads to process UDP packets using workers parameter.

Unix Domain Sockets

UNIX socket is an inter-process communication (IPC) mechanism that allows bidirectional data exchange between processes running on the same machine. You can use this plugin to capture Unix domain socket events (messages) emitted by applications. Similarly to the file input plugin, each event is equal to one line of text emitted by the socket. The plugin supports two modes: server and client. In a server mode, it will listen for client connections and in the client mode, watch when the client connects to a server.

input {
        unix {
                mode => "server"
                path => "/var/logstash/ls"
                data_timeout => 2
        }
}

In the input configuration above, we use the plugin in the server mode and configure it to listen for Unix domain socket events generated at the path /var/logstash/ls.

Websocket Input Plugin

The WebSocket protocol enables interaction between a web client (e.g., browser) and a web server with lower overheads, enabling real-time data transfer and allowing messages to be passed back and forth while keeping a connection open. The Websocket plugin allows reading events from the open Websocket connection. The only required parameter is the URL where the Websocket connection is opened.

input {
  websocket {
    url => "ws://127.0.0.1:3000"
  }
}

The only mode currently supported by the plugin is the client mode in which the plugin connects to a Websocket server and receives events from that server as Websocket messages.

HTTP Input Plugin

HTTP Input Plugin converts HTTP Post request with a body sent by applications to the endpoint specified by the plugin and Logstash will convert the message into the event. Applications can pass JSON, plain text, or any formatted data to the endpoint and use a corresponding codec to transform messages. The plugin can be also used to receive webhook requests to integrate with other applications and services (similarly to what GitHub webhook input does). 

By taking advantage of the vast plugin ecosystem available in Logstash you can trigger actionable events in Logstash right from your application and send them to Elasticsearch. The plugin supports HTTP basic authentication headers and SSL for sending data securely over https, with an option of validating the client's cert.

HTTP Poller

HTTP Poller plugin is another HTTP-based input plugin in Logstash that allows calling a HTTP API, transforming the response message into the event and sending the message up the pipeline (e.g., to Elasticsearch).

As an example, the plugin configured below reads from the URLs routing to the Elasticsearch cluster, decodes and transforms the body of the response with a JSON codec, and saves the encoded data in the metadata_target variable. The config looks like this:

input {
  http_poller {
    urls => {
      test1 => "http://localhost:9200"
      test2 => {
        method => get
        user => "Qbox"
        password => "qbox9238"
        url => "http://localhost:9200/_cluster/health"
        headers => {
          Accept => "application/json"
        }
     }
    }
    request_timeout => 60
    # Supports "cron", "every", "at" and "in" schedules by rufus scheduler
    schedule => { cron => "* * * * * UTC"}
    codec => "json"
    # A hash of request metadata info will be stored here
    metadata_target => "http_poller_metadata"
  }
}

In addition to the discussed plugins, Logstash has a good support for various email protocols and chat servers. For example, using IRC Input Plugin, you can read events from Internal Relay Chat (IRC) servers that belong to an application layer protocol that facilitates communication in the form of the text. 

In its turn, XMPP Input plugin is designed to receive events over XMPP/Jabber, a protocol for instant messaging, multi-party chat, collaboration, voice and video call, and generalized routing of XML data. This plugin can be used for accepting events from humans or applications XMPP, or you can employ it for PubSub or general message passing to Logstash. Finally, IMAP Input plugin allows reading mails from the Internet Message Access Protocol (IMAP ) servers. Along with POP3, IMAP is the most commonly used Internet mail protocol for retrieving emails supported by all modern email clients and web servers.

Support for Pub/Sub Pipelines, Data Streaming, and Messaging Brokers

Data streaming and real-time messaging between applications are very popular in the age of IoT, micro-services, and real-time applications. Logstash has a good support for various kinds of message brokers and data streaming platforms. In this section, we are going to discuss just a few of them. Let's get started!

Google Pub/Sub

Google Cloud Pub/Sub implements pub/sub messaging pattern for applications in the cloud. It is a middleware that provides API for publishers to create certain topics to send messages to and APIs for subscribers to create subscriptions to these topics. The platform supports many-to-many and asynchronous messaging with a low-latency and strong security. You can easily ingest events from the Google Pub/Sub API with Google Pub/Sub, and you can also consume Stackdriver Logging messages if needed.

To use the plugin, you must have a Google project set up and a Google Cloud Platform Service Account if you are running Logstash outside the Google platform. For the plugin to work, you should also manually create a topic, specify a subscription, and refer to them in the Logstash config file.

Below is the basic configuration for the plugin:

input {
    google_pubsub {
        # Your GCP project id (name)
        project_id => "my-project-99234"
        # A topic to read messages from
        topic => "logstash-test-log"
        subscription => "logstash-sub"
        # If you are running logstash within GCE, it will use
        # Application Default Credentials and use GCE's metadata
        # service to fetch tokens.  However, if you are running logstash
        # outside of GCE, you will need to specify the service account's
        # JSON key file below.
        #json_key_file => "/home/erjohnso/pkey.json"
    }
}
output { stdout { codec => rubydebug } }

Apache Kafka Plugin

Apache Kafka is a streaming platform that combines a messaging queue and publish/subscribe functionality. Kafka is very useful for building real-time streaming data pipelines for systems or applications and building real-time streaming applications that transform or react to the streams of data. Similarly to Google Pub/Sub, Kafka includes publishers who can create topics ingesting some arbitrary data and consumers who can subscribe to one or more of these topics. In addition, Kafka includes Streams API that allows applications to operate as stream processors which transform data and pass it to some output. Kafka is a distributed and scalable system where topics can be split into multiple partitions distributed across multiple nodes in the cluster.

The Logstash Kafka plugin easily integrates with Kafka Producer and Consumer APIs. You can specify multiple topics to subscribe to while using the default offset management strategy. A great feature of the plugin is that you can run multiple Logstash instances reading the same topic in order to distribute load across multiple physical machines. To use this feature, you need to specify a group_id which creates a single logical subscriber made of multiple processors. Messages in a topic will be distributed to all Logstash instances with the same group_id.

input {
    kafka {
            bootstrap_servers => "localhost:9092"
            topics => "test"
            group_id => "99ds9932"
    }
}

RabbitMQ

RabbitMQ is a popular messaging broker for storing and exchanging messages. It supports data delivery using various approaches such as pub/sub pattern, work queues, and asynchronous processing. All this comes with a support for a wide variety of messaging protocols and a distributed messaging environment.

By default, the RabbitMQ input plugin will listen for all messages in a RabbitMQ queue and save message properties in the [@metadata][rabbitmq_properties] field if the @metadata_enabled setting is enabled. For example, to save the RabbitMQ message’s timestamp property into the Logstash event’s @timestamp field, you can the date filter to parse the [@metadata][rabbitmq_properties][timestamp]field:

filter {
  if [@metadata][rabbitmq_properties][timestamp] {
    date {
      match => ["[@metadata][rabbitmq_properties][timestamp]", "UNIX"]
    }
  }
}

Working with Cloud Providers and Popular Web Applications

Logstash has great support for various cloud provider services like Amazon S3 and web applications like Salesforce and Twitter. There are too many more input plugins in this category to mention, so we'll focus just on some of them.

Amazon S3

Amazon S3 Input plugin integrates with the Amazon S3 --  object storage built to store and retrieve data from websites and mobile apps, corporate software, and data from IoT sensors and devices.

Amazon S3 input plugin can stream events from files in S3 buckets in a way similar to File input plugin discussed above. As in the case with File Input plugin, each line from each file in S3 bucket will generate an event and Logstash will capture it. To use this plugin, you'll need a S3 bucket configured and AWS credentials to access that bucket.

input{
  s3{
    bucket => 'bucket_name'
    region => 'eu-west-1'
    access_key_id => YOUR AWS ACCESS KEY
    secret_access_key => YOUR AWS SECRET KEY
  }
}

Salesforce Input

The Logstash Salesforce input plugin integrates with Salesforce -- a popular PaaS for CRM, task management, and marketing. The plugin allows querying Salesforce using Salesforce Object Query Language (SOQL) -- a query language designed to retrieve information from the Salesforce system. In order to use this plugin, you will need to create a new SFDC Application using Oauth. Other prerequisites you need are Salesforce credentials and your security token for your Salesforce instance. 

Example:

input {
  salesforce {
    client_id => 'OAUTH CLIENT ID FROM YOUR SFDC APP'
    client_secret => 'OAUTH CLIENT SECRET FROM YOUR SFDC APP'
    username => 'email@example.com'
    password => 'yourpassword'
    security_token => 'SECURITY TOKEN'
    sfdc_object_name => 'Sales'
  }
}

The example above demonstrates a basic usage of the Salesforce input plugin. The input is configured to retrieve "Sales" object data from the SFDC app.

Twitter Input

Twitter input plugin makes it simple to use ELK stack to ship Twitter data to Elasticsearch and use it for the analysis of Twitter trends. The plugin can ingest events from the Twitter Streaming API and ship them directly to Elasticsearch. The API allows tracking tweets and retweets of multiple users and replies to any tweet created by the users, filtering tweets by language, location of the user, keywords found in the text etc. You can also use Logstash filter to further refine the events and create fields that can be later analyzed in Elasticsearch.

Example:

input {
  twitter {
      consumer_key => "consumer_key"
      consumer_secret => "consumer_secret"
      oauth_token => "access_token"
      oauth_token_secret => "access_token_secret"
      keywords => ["AWS","Qbox","Elasticsearch"]
      full_tweet => true
  }
}

AWS CloudWatch

Amazon CloudWatch is an AWS cloud monitoring service that allows monitoring AWS applications and instances to get actionable insights about your cloud deployments. The platform collects various types of operational data such as logs, metrics, and events. You can use CloudWatch to configure high-resolution alarms, visualize logs, take automated actions, troubleshoot issues, and discover insights.

You can directly connect to the stream of events from AWS CloudWatch using Logstash CloudWatch input plugin. As with other AWS plugins, you'll need AWS credentials to use the CloudWatch input.

A sample configuration that streams EC2 metrics data to Logstash may look as follows:

input {
  cloudwatch {
    namespace => "AWS/EC2"
    metrics => [ "CPUUtilization" ]
    filters => { "tag:Group" => "Production" }
    region => "ap-southeast-2"
  }
}

As you see from the example above, the plugin allows configuring various filters to define what metrics to fetch from CloudWatch and to set granularity of returned data points.

Conclusion

It's clear that Logstash supports a wide variety of popular log and event sources. In particular, you can use input plugins for a number of major network protocols, messaging brokers, IRC servers, databases, cloud services, and web applications. 

Even more, Logstash can ingest bash commands, system logs and other system information, local files and files over the network, converting them into valuable events and logs filtered and enriched for the subsequent analysis in the ELK stack or any other log analysis solution you prefer. All these features make Logstash a powerful component of the ELK stack for shipping and normalizing data. 

The plugins discussed in this overview are just the tip of the iceberg offered by the Logstash community. Check out the official Logstash documentation to find out more -- and stay tuned to the upcoming Logstash tutorials to learn about specific use cases of Logstash plugins.