Package translate :: Package search :: Module terminology
[hide private]
[frames] | no frames]

Source Code for Module translate.search.terminology

 1  #!/usr/bin/env python 
 2  # -*- coding: utf-8 -*- 
 3  # 
 4  # Copyright 2006-2009 Zuza Software Foundation 
 5  # 
 6  # This file is part of the Translate Toolkit. 
 7  # 
 8  # This program is free software; you can redistribute it and/or modify 
 9  # it under the terms of the GNU General Public License as published by 
10  # the Free Software Foundation; either version 2 of the License, or 
11  # (at your option) any later version. 
12  # 
13  # This program is distributed in the hope that it will be useful, 
14  # but WITHOUT ANY WARRANTY; without even the implied warranty of 
15  # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the 
16  # GNU General Public License for more details. 
17  # 
18  # You should have received a copy of the GNU General Public License 
19  # along with this program; if not, see <http://www.gnu.org/licenses/>. 
20   
21  """A class that does terminology matching""" 
22   
23   
24 -class TerminologyComparer:
25
26 - def __init__(self, max_len=500):
27 self.match_info = {} 28 self.MAX_LEN = max_len
29
30 - def similarity(self, text, term, stoppercentage=40):
31 """Returns the match quality of C{term} in the C{text}""" 32 # We could segment the words, but mostly it will give less ideal 33 # results, since we'll miss plurals, etc. Then we also can't search for 34 # multiword terms, such as "Free Software". Ideally we should use a 35 # stemmer, like the Porter stemmer. 36 37 # So we just see if the word occurs anywhere. This is not perfect since 38 # we might get more than we bargained for. The term "form" will be found 39 # in the word "format", for example. A word like "at" will trigger too 40 # many false positives. 41 42 text = text[:self.MAX_LEN] 43 44 pos = text.find(term) 45 if pos >= 0: 46 self.match_info[term] = {'pos': pos} 47 return 100 48 return 0
49