Information Retrieval Systematic 4  Fuzzy Search, Edit Distance, q-Gram Index

Give a query q and a string s

What is fuzzy search, according to wiki, “locates Web pages that are likely to be relevant to a search argument even when the argument does not exactly correspond to the desired information”. Per definition above, we need to compute “edit distance” between q and s for fuzzy search.

So what is “edit distance”? So Vladimir Lavenshitein invented this concept in 1965, where for two strings x and y, ED(x,Y) := minimal number of tra’fo’s to get from x to y

It comes naturally to apply dynamic programming algorithm to compute ED(x,y):

q-gram of a string are simply all substrings of length q

Similar to inverse index or document index, where each word has an document id, q-gram index is recording each q-gram with words ids.

q-gram index is useful in fuzzy search which can be illustrated by the following example.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.