We have already discussed the “Langdetect Ingest Plugin” in a previous post. We now focus on the “elasticsearch-langdetect” or the “Nakatani Shuyo’s language detector” in this post.
The “elasticsearch-langdetect” plugin offers a mapping type to specify fields where we want to enable language detection. Detected languages are indexed into a subfield of the field named ‘lang’. The field can be queried for language codes.
We can use the multi_field mapping type to combine this plugin with the attachment mapper plugin, to enable language detection in base64-encoded binary data. Currently, UTF-8 texts are supported only.
The plugin also offers a REST endpoint, where a short text can be posted to in UTF-8, and the plugin responds with a list of recognized languages.