Analyzers are made up of two main components: a Tokenizer and a set of Token Filters. The tokenizer splits text into tokens according to some set of rules, and the token filters each perform operations on those tokens. The result is a stream of processed tokens, which are either stored in the index or used to query results.

Keep reading

A comprehensive log management and analysis strategy is mission critical, enabling organizations to understand the relationship between operational, security, and change management events and to maintain a comprehensive understanding of their infrastructure. Log files from web servers, applications, and operating systems also provide valuable data, although in different formats, and in a random and distributed fashion.

As with any web server, the task of logging NGINX is somewhat of a challenge. NGINX access and error logs can produce thousands of log lines every second, and this data, if monitored properly, can provide you with valuable information not only on what has already transpired but also on what is about to happen. But how do you extract actionable insights from this information? How do you effectively monitor such a large amount of data?

NGINX access logs contain a wealth of information including client requests and currently active client connections that, if monitored efficiently, can provide a clear picture of how the web server and the application that it serves is behaving. This tutorial describes how Qbox can be used to overcome this challenge by monitoring NGINX access logs with Qbox provisioned Elasticsearch Stack.

Keep reading

A comprehensive log management and analysis strategy is mission critical, enabling organizations to understand the relationship between operational, security, and change management events and to maintain a comprehensive understanding of their infrastructure. Log files from web servers, applications, and operating systems also provide valuable data, although in different formats, and in a random and distributed fashion.

Logs are a crucial part of any system because they give you insight into what a system is doing as well what happened. Virtually every process running on a system generates logs in some form or another. These logs are usually written to files on local disks. When your system grows to multiple hosts, managing the logs and accessing them can get complicated. 

Searching for a particular error across hundreds of log files on hundreds of servers is difficult without good tools. A common approach to this problem is to set up a centralized logging solution so that multiple logs can be aggregated in a central location. To effectively consolidate, manage, and analyze these different logs, many customers choose to implement centralized logging solutions using Elasticsearch, Logstash, and Kibana, popularly known as ELK Stack.

Keep reading

Elasticsearch is a distributed, RESTful search and analytics engine capable of solving a growing number of use cases. As the heart of the Elastic Stack, it centrally stores your data so you can discover the expected and uncover the unexpected. It is a near-real-time search platform. What this means is there is a slight latency (normally one second) from the time you index a document until the time it becomes searchable.

Provisioning an Elasticsearch cluster in Qbox is easy. Please refer to Easy How-To: Provisioning an Elasticsearch Cluster on Qbox to walk through the initial steps to start and configure your cluster.

Keep reading

A fuzzy search is a process that locates web pages or documents that are likely to be relevant to a search argument even when the argument does not exactly correspond to the desired information. 

A fuzzy search is done by means of a fuzzy matching query, which returns a list of results based on likely relevance even though search argument words and spellings may not exactly match. Exact and highly relevant matches appear near the top of the list.

Keep reading

This post is part 3 of a 3-part series about tuning Elasticsearch Indexing. Part 1 can be found here and Part 2 can be found here.

This tutorial series focuses specifically on tuning elasticsearch to achieve maximum indexing throughput and reduce monitoring and management load. 

Elasticsearch provides sharding and replication as the recommended way for scaling and increasing availability of an index. A little over allocation is good but a bazillion shards is bad. It is difficult to define what constitutes too many shards, as it depends on their size and how they are being used. A hundred shards that are seldom used may be fine, while two shards experiencing very heavy usage could be too many. Monitor your nodes to ensure that they have enough spare capacity to deal with exceptional conditions.

Keep reading