Fuzzy String Matching

Fuzzy string matching in a dictionary using a Levenshtein Automaton, implemented in Java.
It retrieves all words that are similar to an incorrect query word. It can be used for spell checking, automatic correction of query words in search engines (Google's "Did you mean: X") and other NLP tasks. Compared to the classic dynamic-programming algorithm for computing the Levenshtein distance, this approach scales very well to dictionaries with more than 2 million words.

Uses a pre-built dictionary (represented by a Trie) and a DFA built from the query word which accepts candidates with at most K edit distance errors (insertion, deletion, substitution). For the most common edit distance (K = 2) it can use an inverted-word dictionary and parallel search in both dictionaries to reduce the number of tested candidates and substantially increase the query speed. Optionally it can use a cache for automatons requested frequently.

More details about the algorithm can be found in the following blog post:
Damn Cool Algorithms: Levenshtein Automata

A more theoretical presentation of the algorithm can be found in the following paper
(not all ideas from the paper are implemented yet):
Fast string correction with Levenshtein automata

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fuzzy String Matching

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

gratianlup/FuzzyStringMatching

Folders and files

Latest commit

History

Repository files navigation

Fuzzy String Matching

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages