Package translate :: Package search :: Module terminology
[hide private]
[frames] | no frames]

Source Code for Module translate.search.terminology

 1  #!/usr/bin/env python 
 2  # -*- coding: utf-8 -*- 
 3  # 
 4  # Copyright 2006-2009 Zuza Software Foundation 
 5  # 
 6  # This file is part of the Translate Toolkit. 
 7  # 
 8  # This program is free software; you can redistribute it and/or modify 
 9  # it under the terms of the GNU General Public License as published by 
10  # the Free Software Foundation; either version 2 of the License, or 
11  # (at your option) any later version. 
12  # 
13  # This program is distributed in the hope that it will be useful, 
14  # but WITHOUT ANY WARRANTY; without even the implied warranty of 
15  # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the 
16  # GNU General Public License for more details. 
17  # 
18  # You should have received a copy of the GNU General Public License 
19  # along with this program; if not, see <http://www.gnu.org/licenses/>. 
20   
21  """A class that does terminology matching""" 
22   
23 -class TerminologyComparer:
24 - def __init__(self, max_len=500):
25 self.match_info = {} 26 self.MAX_LEN = max_len
27
28 - def similarity(self, text, term, stoppercentage=40):
29 """Returns the match quality of C{term} in the C{text}""" 30 # We could segment the words, but mostly it will give less ideal 31 # results, since we'll miss plurals, etc. Then we also can't search for 32 # multiword terms, such as "Free Software". Ideally we should use a 33 # stemmer, like the Porter stemmer. 34 35 # So we just see if the word occurs anywhere. This is not perfect since 36 # we might get more than we bargained for. The term "form" will be found 37 # in the word "format", for example. A word like "at" will trigger too 38 # many false positives. 39 40 text = text[:self.MAX_LEN] 41 42 pos = text.find(term) 43 if pos >= 0: 44 self.match_info[term] = { 'pos': pos } 45 return 100 46 return 0
47