Sloan Ahrens

Elasticsearch 2.0.0 introduced a number of breaking changes. When I set out to install ES 2.0 to do some local testing, I found that the techniques that I had been using to set up virtual machines for local development (for instance, here) were no longer adequate. So I set out to discover what the “proper” method should be, and along the way I ran into a few problems. I’ll outline those issues here, hopefully saving some other people a little bit of trouble.

The instructions that follow will assume you are using OSX. It should be straightforward to adapt them to other operating systems, but I will not address those considerations here.

Keep reading

There is growing interest in the power of Apache Spark to do large-scale data analytics, including tests of machine-learning algorithms against large datasets. We also take interest in Spark as part of a larger technical solution featuring a web front-end that allowing users to start jobs on the back end. In this article, we take you through the building of a software-as-a-service application.

Keep reading

After answering a question about autocomplete on StackOverflow, we thought it best to come over to the Qbox blog and write more extensively about the different ways of approaching autocomplete.

In this article, we include an example of how to get autocomplete up and running quickly in Elasticsearch with the Completion Suggest feature. We don’t intend for this to be a complete treatment of the topic, but we do aim to give you enough information to get going as painlessly as possible.

Keep reading

In this post we will walk though the basics of using ngrams in Elasticsearch.

Wikipedia has this to say about ngrams:

In the fields of computational linguistics and probability, an n-gram is a contiguous sequence of n items from a given sequence of text or speech. The items can be phonemes, syllables, letters, words or base pairs according to the application. The n-grams typically are collected from a text or speech corpus. When the items are words, n-grams may also be called shingles.

In the fields of machine learning and data mining, “ngram” will often refer to sequences of n words. In Elasticsearch, however, an “ngram” is a sequnce of n characters. There are various ays these sequences can be generated and used. We’ll take a look at some of the most common.

Keep reading

In the last article, Sparse Matrix Multiplication with Elasticsearch and Apache Spark, we went through a method of doing large-scale, sparse matrix multiplication using Apache Spark and Elasticsearch clusters in the cloud. This article is really an extension to that tutorial, in which we will generalize the method to rectangular matrices. This is important because we are crafting a future article that will make use of this technique. We also need it to function for non-square matrices.

Keep reading

In this article, we continue the work from Deploying Elasticsearch and Apache Spark to the Cloud, Machine-Learning Series, Part 3.

In previous posts, we’ve gone through a number of steps for creating a basic infrastructure for large-scale data analytics using Apache Spark and Elasticsearch clusters in the cloud. In this post will use that infrastructure to do a task that is common in machine-learning and data mining: a task known as sparse matrix multiplication.

Keep reading