Package translate :: Package search :: Module lshtein :: Class LevenshteinComparer
[hide private]
[frames] | no frames]

Class LevenshteinComparer

source code

Instance Methods [hide private]
 
__init__(self, max_len=200) source code
 
similarity(self, a, b, stoppercentage=40) source code
 
similarity_real(self, a, b, stoppercentage=40)
Returns the similarity between a and b based on Levenshtein distance.
source code
Method Details [hide private]

similarity_real(self, a, b, stoppercentage=40)

source code 

Returns the similarity between a and b based on Levenshtein distance. It can stop prematurely as soon as it sees that a and b will be no simmilar than the percentage specified in stoppercentage.

The Levenshtein distance is calculated, but the following should be noted:

  • Only the first MAX_LEN characters are considered. Long strings differing at the end will therefore seem to match better than they should. See the use of the variable penalty to lessen the effect of this.
  • Strings with widely different lengths give the opportunity for shortcut. This is by definition of the Levenshtein distance: the distance will be at least as much as the difference in string length.
  • Calculation is stopped as soon as a similarity of stoppercentage becomes unattainable. See the use of the variable stopvalue.
  • Implementation uses memory O(min(len(a), len(b))
  • Excecution time is O(len(a)*len(b))