Python script for fast text fuzzy search (based on Levenshtein's distance)
This script searches for phrases in large text (.txt) files, with some difference tolerance.
Optimised for natural human readable text (articles, books..) as it heavily relies on words separated by whitespaces.
** Does not work with substrings (eg. will not find 'brown' in 'quickbrownfox')
Run with -h for details
-srcsource file path (required)-fstring to find (required)-mslmax word length difference-mldmax Levenshtein distance
Default max Levenshtein distance (LD) is 2 and max word length difference (WLD) is 1:
| text part | to find | match | reason |
|---|---|---|---|
| QUICK BROWN FOX JUMPS | brown fox | yes | case insensitive |
| quick bronw fox jumps | brown fox | yes | LD=2, WLD=0 |
| quick brownn fox jumps | brown fox | yes | LD=1, WLD=1 |
| quick green fox jumps | brown fox | no | |
| quick brownnn fox jumps | brown fox | no | LD=2, |