In Part II of the article, we'll focus on Qbox plugins that provide various third-party integrations including SQL, Neo4j graph platform, and Couchbase Transport, and we will examine language plugins that enhance querying and analysis of the text in Korean, Chinese, Polish, Hebrew, and several other languages.

In Part I of the article, we reviewed the Elasticsearch plugins that enable Unicode text operations, phonetic analysis and decompounding in certain languages, and a number of plugins for native scripting, text concatenation, tokenization, and extracting metadata from files.

In Part II, we'll focus on plugins that provide various third-party integrations including SQL, graph-aided search with Neo4j, and Couchbase Transport, and we'll examine language plugins that enhance querying and analysis of the text in Korean, Chinese, Polish, Hebrew, and several other languages.

Elasticsearch SQL

This plugin allows using SQL syntax to query Elasticsearch indices. If you are not using Qbox-hosted Elasticsearch, you can also opt for SQL Access module in X-Pack that supports SQL queries in JSON format and ships with a CLI that allows connecting to the ES instance and execute SQL queries.

To illustrate how Elasticsearch SQL works, let's first save data to experiment with to our Elasticsearch index:

PUT /library/book/_bulk?refresh
{"index":{"_id": "The Call of the Wild"}}
{"name": "The Call of the Wild", "author": "Jack London", "release_date": "1990-06-02", "page_count": 541}
{"index":{"_id": "Oliver Twist"}}
{"name": "Oliver Twist", "author": "Charles Dickens", "release_date": "1985-05-26", "page_count": 478}
{"index":{"_id": "The Picture of Dorian Gray"}}
{"name": "Dune", "author": "Oscar Wilde", "release_date": "1965-06-01", "page_count": 349}

As you see, we did a bulk upload of three books to our library index. We can test whether the plugin works by making a SQL query in Kibana "Dev Tools" to return all books published after 1966-01-01:

GET /_sql?sql=select * from library where release_date >'1966-01-01' 

Kibana will automatically URL encode this query for you and return two matched books from the library index.

{
  "took": 76,
  "timed_out": false,
  "_shards": {
    "total": 4,
    "successful": 4,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0,
    "hits": [
      {
        "_index": "library",
        "_type": "book",
        "_id": "The Call of the Wild",
        "_score": 0,
        "_source": {
          "name": "The Call of the Wild",
          "author": "Jack London",
          "release_date": "1990-06-02",
          "page_count": 541
        }
      },
      {
        "_index": "library",
        "_type": "book",
        "_id": "Oliver Twist",
        "_score": 0,
        "_source": {
          "name": "Oliver Twist",
          "author": "Charles Dickens",
          "release_date": "1985-05-26",
          "page_count": 478
        }
      }
    ]
  }
}

That's it! You can experiment with other SQL query types and aggregations that are also supported by the Elasticsearch SQL plugin.

Delete by Query Plugin

This plugin adds support for deleting all documents that match the specified query using Scan/Scroll and Bulk APIs under the hood. It is available in Elasticsearch 2.x versions but was later replaced by a new Delete by Query API implemented in the core. In the older versions of the plugin, documents can be deleted using a simple query string as the URL parameter:

DELETE /library/book/_query?q=author:Jack London

The same can be done with the Query DSL syntax like this:

DELETE /library/book/_query
{
  "query": { 
    "term": {
      "author": "Jack London"
    }
  }
}

Couchbase Transport

This plugin makes your Elasticsearch nodes appear as a Couchbase Server, allowing coordination of data searches across both platforms and enabling the retrieval of more complex document-structures. For example, you can use Elasticsearch to store product names of your product catalog app, and upon their successful retrieval from the index, you can use the obtained data to access a corresponding detailed product-descriptions from the Couchbase Server. In this way, you can combine the power of Elasticsearch API with the Engagement Database philosophy of Couchbase for better customization and optimization of user experience. The plugin also supports real-time replication of data from Couchbase server to Elasticsearch, ensures network-failure recovery by tracking current replication status, and monitors node-failures in both Elasticsearch and Couchbase clusters to ensure that data is always available.

GraphAware Graph-Aided Search

This plugin provides the bidirectional integration between Neo4j and Elasticsearch. Neo4j is a great graph-based database powered by the graph query language Cypher. Storing data as connected graphs in Neo4j allows capturing and representing relationships between various data points such as social connections of users or all running services dependent on a particular server. Using GraphAware plugin with Elasticsearch, you can enhance ES results by boosting and filtering them with Neo4j graph data. The plugin would request additional information from Neo4j after Elasticsearch returned the response and enrich obtained search results with graph information.

For example, you can use the plugin to boost the results from your movie collection with user movie preferences and interests obtained via Graphaware Recommendation Plugin running on top of Neo4j.

curl -X POST http://localhost:9200/neo4j-index/Movie/_search -d 
'{
    "query" : {
        "match_all" : {}
    },
    "gas-booster" :{
          "name": "SearchResultNeo4jBooster",
          "target": "2",
          "maxResultSize": 10,
          "keyProperty": "objectId",
          "neo4j.endpoint": "/graphaware/recommendation/movie/filter/"
       }
  }';

In this way, you can improve the quality of response results and customize them for your application's users.

Elasticsearch Migration

This plugin helps you prepare for upgrading to the next major version of Elasticsearch. It performs the following tasks to determine whether you can upgrade directly or need to make changes to your cluster or data beforehand:

  • runs regular checks on your Elasticsearch cluster and alerts on any issues that should be fixed before upgrading.
  • reindexes indices created prior to Elasticsearch v.2.0.0 before using them in Elasticsearch 5.x or later versions.
  • logs a message whenever a deprecated functionality is used. With the deprecation logging tool, you can enable or disable such logging on your Elasticsearch cluster.

Carrot2 Results Clustering

The plugin adds on-the-fly text clustering functionality to your Qbox-hosted Elasticsearch nodes. In its basic usage, the plugin attempts to group together similar documents and then assigns human-readable labels that reflect the most common themes/topics appearing in the document groups. As a result, this plugin can create dozens of document clusters that share similar content. In the image below, for example, we can see the clusters created by the Carrot2 algorithm on the search for web pages mentioning "Elasticsearch."

Carrot2 Elasticsearch

As you see, the algorithm has grouped documents found on the web into multiple clusters and labeled them accordingly. This functionality can be efficiently used to simplify the user interaction with search results and structure information from your ES indices in a meaningful way.

Language Plugins in Qbox

Qbox ships with a number of language plugins including tools for morphological analysis, tokenization, and stemming in Korean, Japanese (Kuromoji), Russian, English, Chinese, and Polish languages.

Russian and English Morphological Analysis

This plugin integrates Russian and English morphology for Lucene and Java frameworks into Elasticsearch by adding two analyzers and two token filters for these languages. The morphology analysis in the plugin uses the dictionary-based morphology with some heuristics for unknown words.

Smart Chinese Analysis Plugin

This plugin integrates Lucene's Smart Chinese analysis module into Elasticsearch. To derive optimal word-level segmentation for the Chinese text, the analyzer uses a word frequency probability model trained on a large corpus of Chinese texts with the Hidden Markov Model. Using this model, the analyzer first breaks the text into sentences and then segments each sentence into words, preparing the text for further linguistic analysis.

Japanese (Kuromoji) Analysis

The plugin integrates Kuromoji, an open source Japanese morphological analyzer for Apache Lucene and Apache Solr. Kuromoji enables a number of operations with the Japanese text including word segmentation, part-of-speech tagging (assigning word-categories like verbs, adjectives, particles), lemmatization (getting dictionary forms for inflected adjectives and verbs), and extracting reading for kanji-adopted logographic Chinese characters.

Stempel Polish Analysis Plugin

This plugin integrates Lucene's Stempel analysis module for the Polish language. The module offers high quality stemming tables and a universal algorithmic stemmer specifically designed to work with a highly-inflectional Polish language based on the extensive corpus of the Polish language. The plugin is useful for converting inflected words to their morphological origins (stems) thereby dramatically improving search results and decreasing index sizes.

Hebrew Plugin

The plugin ships with the Hebrew dictionary files and a "hebrew" analyzer that enables efficient querying and analysis of Hebrew text in Elasticsearch. The plugin also enables matching of inflected words using its built-in lemmatization and supports exact matching with the exact query search.

The Hebrew analyzer may be enabled and tested using the following configuration:

PUT hebrew-tut
{
    "mappings": {
        "test": {
            "properties": {
                "content": {
                    "type": "text",
                    "analyzer": "hebrew"
                }
            }
        }
    }
}
PUT hebrew-tut/ex/1
{
    "content": "בדיקות"
}
POST hebrew-tut/_search
{
    "query": {
        "match": {
           "content": "בדיקה"
        }
    }
}

Korean Analysis

Since Korean in an agglutinative language where the predicate changes its form according to its ending, and the noun is usually followed by one or several postpositions, querying Korean documents directly will only return a single form of predicates or nouns, omitting other relevant documents for we might be searching. Qbox offers the Seunjeon Korean analyzer that solves this issue by enabling efficient search and analysis of Korean documents in Elasticsearch. It is a mecab-ko-dic-based Korean analyzer running on JVM and providing Java and Scala interfaces and shipping with the built-in Korean dictionary out of the box.

Conclusion

We've completed a review of all major plugins provided by the Qbox-hosted Elasticsearch. We hope you now have a better understanding of which plugins you might need for your Elasticsearch application to be more productive. Installing Elasticsearch plugins on your Qbox clusters is ridiculously easy, so don't miss out the opportunity to immediately enhance your Elasticsearch data search and analysis.