Qbox to join forces with Instaclustr. Read about it on our blog post here.

Naturally, a question that quickly comes to mind is: Which full-text search solution has the best performance for use cases like this one?

Read this short article to see our approach in answering this question.

Most developers will come to learn that there are several good solutions with sufficient performance to handle a full-text search on more than a million data entries:

  • MongoDB, using custom functions
  • MySQL
  • PostgreSQL
  • Elasticsearch

All of these are respectable data management systems. But there are two major considerations to keep in mind when choosing the best full-text search solution. First, SQL full-text search is rather simple to set up for indexing and queries — but there are significant drawbacks:

  • You have virtually no control over the indexing.
  • You can do very little when specifying the index keys, lexers, and stemmers.
  • The search will run on the DBMS server, which is often your least-scalable infrastructure.

Second, we know that Elasticsearch does require more work up front because you need to set up and maintain a dedicated cluster of nodes. You also need to provide code that performs the index operations—which may also involve a scheduled job that works from a change-log (processing new/changed data) to build the fragments for indexing. As with SQL, you’ll also need to then take time to build the queries.

But there is huge upside. The reward of your efforts with Elasticsearch will be:

  • Precise control over the index and query
  • Fantastic scalability, since you can provide a cluster for whatever size you need
  • Native document access and indexing

Now, we’ll let you in on a “secret.” In contrast to SQL systems, with Elasticsearch it’s entirely unnecessary to modify the native form of your data (or to perform a preliminary mapping) to accommodate the structure of your queries.

With most applications and data layers, a critical pre-development task is to design the data retrieval methods. This is still an essential task in Elasticsearch development, but here’s what really cool: you can keep your data in native form (the original document) and index it according to that form. Then, you configure Elasticsearch to map the data directly from that original document. The advantage is that you won’t need to burden the document with a bunch of extra fields.

Now, you may be thinking, “Hey, I do find Elasticsearch compelling, but I’m so comfortable with conventional database management systems.” We get it. In fact, we once thought in the same way.

Maybe you’re looking for other use cases so that you can boost confidence before you commit. Check this out: some of our staff here at Qbox are regular contributors over on StackOverflow.com. We learned that StackOverflow was originally built on a SQL full-text search platform. As their platform grew, they had to move over onto Elasticsearch when the feature and performance limitations became too restrictive.

We can say with the highest confidence that it’s worth it. Yes, you’ve got to absorb the extra initial configuration effort. But, when you consider the total effort and cost to get Elasticsearch returning sub-second responses on huge datasets, it’s a very small fraction of the overall effort that’s necessary with a SQL database or Mongo solution. For SQL systems, there’s so much pre-processing that you need to do outside of the datastore to make the data searchable. With those conventional systems, it would be a heroic effort to achieve the balance of performance and query flexibility that Elasticsearch offers.