Analyzers are made up of two main components: a Tokenizer and a set of Token Filters. The tokenizer splits text into tokens according to some set of rules, and the token filters each perform operations on those tokens. The result is a stream of processed tokens, which are either stored in the index or used to query results.

Keep reading

A fuzzy search is a process that locates web pages or documents that are likely to be relevant to a search argument even when the argument does not exactly correspond to the desired information. 

A fuzzy search is done by means of a fuzzy matching query, which returns a list of results based on likely relevance even though search argument words and spellings may not exactly match. Exact and highly relevant matches appear near the top of the list.

Keep reading

In this post, we discuss elasticsearch analyzers. Creation and configuration of analyzers are the main steps to increasing search efficiency.  They are used for adding elements and search.

The main goal of any analyzer is splitting a stream of characters which are overloaded with unnecessary details. They squeeze out the needed information and produce a list of tokens that reflect it. Let’s look at its structure.

Keep reading

The last two blogs in the analyzer series covered a lot of topics ranging from the basics of the analyzers to how to create a custom analyzer for our purpose with multiple elements. In this blog we are going to see a few special tokenizers like the email-link tokenizers and token-filters like edge-n-gram and phonetic token filters.

These tokenizers and filters provide very useful functionalities that will be immensely beneficial  in making our search more precise.

Keep reading

Imagine we have a website where users can put any query in any number of formats. We should map them to standard format and use the words that we have in the fields of our data set.

The main problems we can meet are:

- Incorrect words

- Plural\singular forms

- Too many stopwords

In this post we will show you how to solve these problems in order to improve your free query results.

Keep reading

In the previous blog in our analyzer series we learned, in detail, about the creation of an inverted index, the components of an analyzer, and watched a simple example of how to use those analyzer components as a single entity and analyze the input text.

Now, in this blog we will progress towards the application of analyzers by creating a custom analyzer with multiple components. Here we will have a look into analyzer design and more components that are crucial in making better search results with more accuracy and we will also include examples.

Keep reading