This post is part 2 of a 3-part series about tuning Elasticsearch Indexing. Part 1 can be found here.

The tutorial series focuses specifically on tuning elasticsearch to achieve maximum indexing throughput and reduce monitoring and management load. Elasticsearch is near-realtime, in the sense that when you index a document, you need to wait for the next refresh for that document to appear in search. 

Refreshing is an expensive operation and that is why it’s made at a regular interval (default), instead of after each indexing operation. If you are planning to index a lot of documents and you don’t need the new information to be immediately available for search, you can optimize for indexing performance over search performance by decreasing refresh frequency until you are done indexing.

Keep reading

This post is part 1 of a 3-part series about tuning Elasticsearch Indexing. This series focuses specifically on tuning Elasticsearch to achieve maximum indexing throughput and reduce monitoring and management load. 

As a starting point, assume that you start Elasticsearch, create an index, and feed it with JSON documents without incorporating schemas. Elasticsearch will then iterate over each indexed field of the JSON document, estimate its field, and create a respective mapping. While this may seem ideal, Elasticsearch mappings are not always accurate. If, for example, the wrong field type is chosen, indexing errors will occur.

Keep reading

Not yet enjoying the benefits of a hosted ELK-stack enterprise search on Qbox? Discover how easy it is to manage and scale your Elasticsearch environment.

Get Started 5 minutes to get started

Today we’re excited to announce a much-requested addition to the Qbox dashboard: teams can now manage multiple dashboard users, so that billing can be centralized and passwords never need to be shared.

Keep reading

Qbox is proud to offer Qbox Private Cloud Elasticsearch Hosting, which allows even lower pricing and even tighter integration with the Amazon Web Services ecosystem. Contact Qbox Sales for custom pricing.

Qbox Private Cloud Hosting lets you take advantage of AWS Reserved Instance pricing and AWS Promotional Credits while retaining 24/7 support and maintenance from the Qbox Support team. Count on your clusters being up when you need them and still receive Qbox Support, including: fully managed upgrades, automatic data backups, managed migrations, and Elasticsearch plugins on AWS. Development support is also available.

Keep reading

Editors Note: This post is part 3 of a 3-part series on tuning Elasticsearch performance. Part 1 can be found here and Part 2 can be found here.

Shard Allocation, Rebalancing and Awareness are very crucial and important from the perspective of preventing any data loss or to prevent the painful Cluster Status: RED (a sign alerting that the cluster is missing some primary shards). Apart from shard allocation, everyone loves to tweak threadpools. For whatever reason, it seems people cannot resist increasing thread counts. 

The default threadpool settings in Elasticsearch are very sensible. For all threadpools (except search) the threadcount is set to the number of CPU cores. If we have eight cores, we can be running only eight threads simultaneously. It makes sense to assign only eight threads to any particular threadpool.

In this tutorial, we’ll be focussing on Shard Allocation and Threadpool Configuration settings to keep our cluster's health green and improve overall performance.

Keep reading

Editors Note: This post is part 2 of a 3-part series on tuning Elasticsearch performance. Part 1 can be found here.

If we are using Elasticsearch mainly for search, or if search is a customer-facing feature that is key to our organization, we should monitor query latency and take action if it surpasses a threshold. 

It’s important to monitor relevant metrics about queries and fetches that can help us determine how our searches perform over time. For example, we may want to track cluster's health to provide high availability or track spikes and long-term increases in query requests, so that we can be prepared to tweak our configuration to optimize for better performance and reliability.

In this tutorial, we continue focusing on performance tuning strategies to keep our cluster's health green and improve overall performance.

Keep reading