start

Levenshtein distance

The Levenshtein distance is used for measuring the “distance” or similarity of two character strings. Other similarity algorithms can be supplied to the code that does the matching.

This code is used in pot2po, tmserver and Virtaal. It is implemented in the toolkit, but can optionally use the fast C implementation provided by python-Levenshtein if it is installed. It is strongly recommended that python-levenshtein be installed.

To exercise the code the classfile “Levenshtein.py” can be executed directly with

python Levenshtein.py "The first string." "The second string"

(remember to quote the two parameters)

The following things should be noted:

Shortcommings

The following shortcommings have been identified: