Autocompletion Using Elasticsearch

Autocompletion provides an efficient, user-friendly way to look up a value when the list of possible values is too long for a standard drop-down. I use autocompletion extensively for filtering workorders and assets in our asset tracking and workorder system. Other examples are filtering a list of products in an ecommerce site or finding a document by ‘tag’.

Elasticsearch provides two methods for autocompletion. The first one I’d like to cover is edge n-gram. The second is a completion suggester. Please note that you have to have access to a working Elasticsearch instance and the command line to follow along.

Edge NGram

Understanding edge n-grams is best accomplished by working through an example. Let’s start by creating a test index with the settings and mappings we need for an edge n-gram field. Don’t worry, we’ll go through the settings / mappings next. We’ll also delete the index at the end of the exercise to keep things tidy.

curl -XPUT 'localhost:9200/ngramidx' -H 'Content-Type: application/json' -d'
{
  "settings": {
    "analysis": {
      "analyzer": {
        "autocomplete_analyzer": {
          "tokenizer": "autocomplete_tokenizer",
          "filter": ["lowercase"]
        },
        "autocomplete_search": {
          "tokenizer": "lowercase"
        }
      },
      "tokenizer": {
        "autocomplete_tokenizer": {
          "type": "edge_ngram",
          "min_gram": 2,
          "max_gram": 10,
          "token_chars": ["letter", "digit"]
        }
      }
    }
  },
  "mappings": {
    "fruit": {
      "properties": {
        "name": {
          "type": "text",
          "analyzer": "autocomplete_analyzer",
          "search_analyzer": "autocomplete_search"
        }
      }
    }
  }
}
'

Setting up the index may seem a little complicated so let’s go through it beginning with the settings field. The analysis section contains two custom analyzers, “autocomplete_analyzer” and “autocomplete_search”. It also contains a custom tokenizer, “autocomplete_tokenizer”. The names are arbitrary. You can name them whatever you like.

The autocomplete_analyzer will be used to tokenize and lower case input strings saved to our edge n-gram field. The tokenizer it uses, autocomplete_tokenizer, is where the magic happens. It splits input strings into tokens (n-grams) on characters that are not listed in the token_chars array. In the above example, the token_chars array is set to “letter” and “digit” so tokens will only contain letters and digits. Other characters will cause Elasticsearch to start a new token (fr-ed will become “fr” and “ed”). The tokenizer also sets the minimum length for an ngram to 2, and the maximum to 10.

The second analyzer, “autocomplete_search”, will be applied to text when querying the edge n-gram field. It simply converts the text to lower case. Why not just use the autocomplete_tokenizer when searching as well as indexing? Because we will already be searching for partial strings, like ‘blu’, when we are trying to find a document with ‘blueberry’. We only need to lower case the strings, not tokenize and lower case them like the autocomplete_tokenizer does.

The mappings section sets up our edge n-gram field. It tells Elasticsearch that the “name” field in documents of type “fruit” are text fields and sets the analyzer and search_analyzer for the field to the custom analyzers we defined in the settings section.

After you update the settings using the above curl command, run a query against the index’s _analyze endpoint to see how Elasticsearch tokenizes a string.

curl -XPOST 'localhost:9200/ngramidx/_analyze?pretty' -H 'Content-Type: application/json' -d'
{
  "analyzer": "autocomplete_analyzer",
  "text": "I love bananas"
}
'

Elasticsearch produces the terms [‘lo’, ‘lov’, ‘love’, ‘ba’, ‘ban’, ‘bana’, ‘banan’ , ‘banana’, ‘bananas’]

Now let’s add a document we can query.

curl -XPUT 'localhost:9200/ngramidx/fruit/1?pretty' -H 'Content-Type: application/json' -d'
{
  "name" : "blueberry"
}
'

Phew…We’re finally ready to run a query to see our edge n-gram field in action.

curl -XGET 'localhost:9200/ngramidx/fruit/_search?pretty' -H 'Content-Type: application/json' -d'
{
  "query": {
    "match": {
      "name": {
        "query": "blu"
      }
    }
  }
}
'

If all goes according to plan, you should get the blueberry document back when you run the query.

To keep things tidy, delete the index we just created.

curl -XDELETE 'http://localhost:9200/ngramidx'

Completion Suggester

Let’s turn now to using a completion suggester. A completion suggester is just one example of a suggester in Elasticsearch. Suggesters in Elasticsearch suggest similar terms by comparing the search string against predefined ‘suggestions’ that you associate with a field.

As an example, say you’ve indexed the following document:

{
  "name": "banana",
  "suggestions": {
    "input": [
      "yellow",
      "plantain"
    ]
  }
}

In the above document “name” is a keyword field. “suggestions” is a completion field. The “input” property of the suggestions field contains the suggestions that you want suggesters to match.

There are different types of completion suggester queries but for our purpose let’s stick with a prefix query. A completion suggester using a prefix query for ‘yel’ would match the above ‘banana’ document because ‘yel’ matches the first three characters of the ‘yellow’ suggestion listed in the input array.

A concrete example should make things a little clearer. To start things off let’s create a test index with a mapping that includes a completion field. We’ll delete the index later to keep things nice and neat.

curl -XPUT 'localhost:9200/completionidx' -H 'Content-Type: application/json' -d'
{
  "mappings": {
    "fruit": {
      "properties": {
        "name": {
          "type": "keyword"
        },
        "suggestions": {
          "type": "completion"
        }
      }
    }
  }
}
'

Next, let’s index two documents that contain suggestions.

curl -XPUT 'localhost:9200/completionidx/fruit/1?pretty' -H 'Content-Type: application/json' -d'
{
  "name": "apple",
  "suggestions": {
    "input": ["red", "yellow", "granny", "delicious"]
  }
}
'

curl -XPUT 'localhost:9200/completionidx/fruit/2?pretty' -H 'Content-Type: application/json' -d'
{
  "name" : "banana",
  "suggestions": {
    "input": ["yellow", "plantain"]
  }
}
'

You can verify the index has been created and the documents have been indexed with curl:

curl -XGET 'localhost:9200/completionidx/fruit/_search?pretty'

You can also take a look at the mapping with curl:

curl -XGET 'localhost:9200/completionidx/_mapping?pretty'

Now that we’ve set up our index and added some data, let’s run a query to see how the completion suggester works.

curl -XPOST 'localhost:9200/completionidx/fruit/_search?pretty' -H 'Content-Type: application/json' -d'
{
  "suggest": {
    "name-of-this-suggest": {
      "prefix": "yel",
      "completion": {
        "field": "suggestions"
      }
    }
  }
}
'

Breaking down the query, “suggest” is the query type. “name-of-this-suggest” is an arbitrary name you provide for the suggestion. It will be a key in the response from Elasticsearch allowing you to access the results for that particular query. Matching documents will be in the “options” array under that key. You can retrieve the _source field of the matching documents under the suggest.name-of-this-suggest.options property. “prefix” tells Elasticsearch that we want to use a prefix completion suggester as opposed to a regex.

If you run the above query you should get back both the apple and banana documents. Try the query again using the prefix ‘gra’.

curl -XPOST 'localhost:9200/completionidx/fruit/_search?pretty' -H 'Content-Type: application/json' -d'
{
  "suggest": {
    "name-of-this-suggest": {
      "prefix": "gra",
      "completion": {
        "field": "suggestions"
      }
    }
  }
}
'
You should only get back the apple document because ‘gra’ matches the ‘granny’ suggestion for that document.

When would you use a completion suggester? One great use case is a tagging system. Say you’re building a music library and you want to add tags to your songs so users can find them by searching for the music genre. For example:

{
  "name": "Shadow Of The Day",
  "band": "Linkin Park",
  "tags": {
    "input": ["rock", "alternative"]
  }
}

Searching for ‘roc’ would return all songs tagged as ‘rock’ songs like the one above.

To free up resources delete the index we just created with curl.

curl -XDELETE 'http://localhost:9200/completionidx'

Choosing Between The Two Autocompletion Methods

The following is excerpted from the Elasticsearch website.

When you need search-as-you-type for text which has a widely known order, such as movie or song titles, the completion suggester is a much more efficient choice than edge N-grams. Edge N-grams have the advantage when trying to autocomplete words that can appear in any order.

In other words, if the end user is already familiar with how the text will be ordered, for example alphabetically, use a completion suggester. If they are not already familiar with how the text will be ordered, prefer an edge n-gram.

Conclusion

There you have it. Two different methods for autocompletion with Elasticsearch. There’s quite a bit I didn’t cover such as how you’d wire things up on the front end. I’ve used kendo autocomplete fields with good results in the past. There’s also much more to completion suggesters, and suggesters in general, than I’ve covered here. This should be enough to get you started though. Good luck with your projects!