Home | Trees | Indices | Help |
|
---|
|
1 #!/usr/bin/env python 2 # -*- coding: utf-8 -*- 3 # 4 # Copyright 2006-2009 Zuza Software Foundation 5 # 6 # This file is part of the Translate Toolkit. 7 # 8 # This program is free software; you can redistribute it and/or modify 9 # it under the terms of the GNU General Public License as published by 10 # the Free Software Foundation; either version 2 of the License, or 11 # (at your option) any later version. 12 # 13 # This program is distributed in the hope that it will be useful, 14 # but WITHOUT ANY WARRANTY; without even the implied warranty of 15 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 16 # GNU General Public License for more details. 17 # 18 # You should have received a copy of the GNU General Public License 19 # along with this program; if not, see <http://www.gnu.org/licenses/>. 20 21 """A class that does terminology matching""" 22274729 """Returns the match quality of C{term} in the C{text}""" 30 # We could segment the words, but mostly it will give less ideal 31 # results, since we'll miss plurals, etc. Then we also can't search for 32 # multiword terms, such as "Free Software". Ideally we should use a 33 # stemmer, like the Porter stemmer. 34 35 # So we just see if the word occurs anywhere. This is not perfect since 36 # we might get more than we bargained for. The term "form" will be found 37 # in the word "format", for example. A word like "at" will trigger too 38 # many false positives. 39 40 text = text[:self.MAX_LEN] 41 42 pos = text.find(term) 43 if pos >= 0: 44 self.match_info[term] = { 'pos': pos } 45 return 100 46 return 0
Home | Trees | Indices | Help |
|
---|
Generated by Epydoc 3.0.1 on Wed Mar 3 16:38:27 2010 | http://epydoc.sourceforge.net |