Qbox Joins Instaclustr, Plans March 31, 2023, Sunset Date. Read about it in our blog post here.

Natural Langue Processing, or NLP, is one of the most active areas of research in Data Analytics due to the large volume of data available across the web and the need to analyze and gain insights from this data that constitute to development and growth from a business perspective. There are a number of areas like Entity Extraction, Event Classification Sentiment Analysis, and more that NLP can be thought of like a super set to. We considered how elasticsearch can be used as a source to visualize the end product of all these tasks. This series introduces basic level prototypes of the functional areas of NLP to help you get started.

Sentiment Analysis

Sentiment Analysis is a part of NLP which tries to give the emotional value associated with a text from a human point of view in a computational context. For example, a new film is released and we people express their views and the rating of the film through twitter. Some rate the movie highly, others feel it is average, and some feel it is not up to the mark. Wouldn’t it be nice, at least from the movie director’s point of view, to be able to analyze the views of the audience? Similarly, imagine a new vehicle is launched and the media is actively discussing it. The manufacturers want to know the launch’s impact — positive or negative. This is where sentiment analysis comes in.

For the purpose of this tutorial, consider collecting twitter data based on a hashtag that ensures relevant feeds. A quick google search will feed you with numerous ways to get tweets. The Logstash tool is shipped with Elasticsearch and Kibana to make the perfect toolkit for collecting, indexing, and visualizing data.

Logstash provides a number of plugins categorized under input and output. The input plugins provides alternatives to fetch data that can be imported to elasticsearch. For example, the twitter input plugin fetches feeds from twitter. The jdbc input plugin lets you import sql databases. The output plugins sends your events to various destinations such as a file, csv file, syslog, kafka, mongodb, and more.

There are options to apply intermediary alterations to fetched events before sending them to output destinations in the form of filters. You can add or remove fields from events and perform regex matching to find patterns and process them accordingly. Logstash is written in ruby — so one can even write arbitrary ruby code in filters as you will see in a little while.

We are going to use the twitter input plugin, a ruby filter that lets you write ruby code, and an elasticsearch output plugin to index data to elasticsearch. To find the sentiment of the tweet, use the Alchemy API. I ended up choosing the Alchemy API because you can easily find a ruby integration for the same. There is a Ruby sdk here.

First, create a configuration file and get started. Now, create the logstash config file that let us set up the appropriate plugins and filter. You can see the input plugin here. You need a config like this to get started:

input {
  twitter {
    consumer_key => "Your Consumer Key"
    consumer_secret => "Your secret access token"
    oauth_token => "Your oauth token"
    oauth_token_secret => "Your oauth secret password"
    keywords => [ "Euro" ]

You can get the authentication tokens from twitter. If you are not sure how to do that this will help you.
Besides the tokens, the only field in the configuration is keywords, of which is an array of values that we look for in the tweet.

Similarly, the output of the event to an elasticsearch instance, as well as the configuration, is self explanatory.

br>output {
 elasticsearch { hosts => ["localhost:9200"]
              index => "twitter"
              document_type => "tweet"
stdout {codec => rubydebug }

The last line instructs logstash to print the debug message to stdout.

Now, find the sentiment of the feeds that we are receiving. As already said, we are going to use the ruby filter that lets you run rubycode. You may see the official documentation here.

filter  {
   ruby  {
    code => "
    require '/home/neil/blogs/src/twitter_sentiment/alchemyapi_ruby/alchemyapi.rb';
    alchemyapi = AlchemyAPI.new();
    tweet = event['message'];
    response = alchemyapi.sentiment('text',tweet);
    event['sentiment'] = response['docSentiment']['type'];

Although it is better to create a custom plugin for complex ruby computations, use the ruby filter here for simplicity.  We are adding a field named sentiment to hold the sentiment value (positive, negative or neutral). So once the feed event gets indexed to elasticsearch through the output plugin we will have a field containing the sentiment value of the tweet. It is up to us to decide what to do with it next. How about collecting some insights from our feed index? Let us set up Kibana and create some visualizations.

We assume you know how to setup Kibana. If not, download and extract latest Kibana from the elastic download page and run bin/kibana. You can access the interface at http://localhost:5601.

Now configure an index pattern. For example, type in twitter and you can see our twitter index. Go create a simple terms aggregation on out sentiment field and it will give you a visualization like shown below.



This can be just a starting point as elasticsearch lays open numerous room for data analytics. This post basic level prototypes of the functional areas of NLP to help you get started, specifically a simple sentiment analysis prototype.