similarity_real(self,
a,
b,
stoppercentage=40)
| source code
|
Returns the similarity between a and b based on Levenshtein distance.
It can stop prematurely as soon as it sees that a and b will be no
simmilar than the percentage specified in stoppercentage.
The Levenshtein distance is calculated, but the following should be
noted:
-
Only the first MAX_LEN characters are considered. Long strings
differing at the end will therefore seem to match better than they
should. See the use of the variable penalty to lessen the effect of
this.
-
Strings with widely different lengths give the opportunity for
shortcut. This is by definition of the Levenshtein distance: the
distance will be at least as much as the difference in string length.
-
Calculation is stopped as soon as a similarity of stoppercentage
becomes unattainable. See the use of the variable stopvalue.
-
Implementation uses memory O(min(len(a), len(b))
-
Excecution time is O(len(a)*len(b))
|