This module contains English-language linguistic functions for the Linguistics module. It can be either loaded directly, or by passing some variant of 'en' or 'eng' to the Linguistics::use method.
The functions contained by the module provide:
Plural forms of all nouns, most verbs, and some adjectives are provided. Where appropriate, "classical" variants (for example: "brother" -> "brethren", "dogma" -> "dogmata", etc.) are also provided.
These can be accessed via the #plural, #plural_noun, #plural_verb, and #plural_adjective methods.
Pronunciation-based "a"/"an" selection is provided for all English words, and most initialisms.
Conversion from Numeric values to words are supported using the American "thousands" system. E.g., 2561 => "two thousand, five hundred and sixty-one".
See the #numwords method.
It is also possible to inflect numerals (1,2,3) and number words ("one", "two", "three") to ordinals (1st, 2nd, 3rd) and ordinates ("first", "second", "third").
This module also supports the creation of English conjunctions from Arrays of Strings or objects which respond to the #to_s message. Eg.,
%w{cow pig chicken cow dog cow duck duck moose}.en.conjunction ==> "three cows, two ducks, a pig, a chicken, a dog, and a moose"
Returns the infinitive form of English verbs:
"dodging".en.infinitive ==> "dodge"
Michael Granger <ged@FaerieMUD.org>
The inflection functions of this module were adapted from Damien Conway's Lingua::EN::Inflect Perl module:
Copyright (c) 1997-2000, Damian Conway. All Rights Reserved. This module is free software. It may be used, redistributed and/or modified under the same terms as Perl itself.
The conjunctions code was adapted from the Lingua::Conjunction Perl module written by Robert Rothenberg and Damian Conway, which has no copyright statement included.
Copyright (c) 2003-2008, Michael Granger All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of the author/s, nor the names of the project's contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
This file contains functions for finding relations for English words. It requires the Ruby-WordNet module to be installed; if it is not installed, calling the functions defined by this file will raise NotImplemented exceptions if called. Requiring this file adds functions and constants to the Linguistics::EN module.
# Test to be sure the WordNet module loaded okay. Linguistics::EN.has_wordnet? # => true # Fetch the default synset for the word "balance" "balance".synset # => #<WordNet::Synset:0x40376844 balance (noun): "a state of equilibrium" (derivations: 3, antonyms: 1, hypernyms: 1, hyponyms: 3)> # Fetch the synset for the first verb sense of "balance" "balance".en.synset( :verb ) # => #<WordNet::Synset:0x4033f448 balance, equilibrate, equilibrize, equilibrise (verb): "bring into balance or equilibrium; "She has to balance work and her domestic duties"; "balance the two weights"" (derivations: 7, antonyms: 1, verbGroups: 2, hypernyms: 1, hyponyms: 5)> # Fetch the second noun sense "balance".en.synset( 2, :noun ) # => #<WordNet::Synset:0x404ebb24 balance (noun): "a scale for weighing; depends on pull of gravity" (hypernyms: 1, hyponyms: 5)> # Fetch the second noun sense's hypernyms (more-general words, like a superclass) "balance".en.synset( 2, :noun ).hypernyms # => [#<WordNet::Synset:0x404e5620 scale, weighing machine (noun): "a measuring instrument for weighing; shows amount of mass" (derivations: 2, hypernyms: 1, hyponyms: 2)>] # A simpler way of doing the same thing: "balance".en.hypernyms( 2, :noun ) # => [#<WordNet::Synset:0x404e5620 scale, weighing machine (noun): "a measuring instrument for weighing; shows amount of mass" (derivations: 2, hypernyms: 1, hyponyms: 2)>] # Fetch the first hypernym's hypernyms "balance".en.synset( 2, :noun ).hypernyms.first.hypernyms # => [#<WordNet::Synset:0x404c60b8 measuring instrument, measuring system, measuring device (noun): "instrument that shows the extent or amount or quantity or degree of something" (hypernyms: 1, hyponyms: 83)>] # Find the synset to which both the second noun sense of "balance" and the # default sense of "shovel" belong. ("balance".en.synset( 2, :noun ) | "shovel".en.synset) # => #<WordNet::Synset:0x40473da4 instrumentality, instrumentation (noun): "an artifact (or system of artifacts) that is instrumental in accomplishing some end" (derivations: 1, hypernyms: 1, hyponyms: 13)> # Fetch just the words for the other kinds of "instruments" "instrument".en.hyponyms.collect {|synset| synset.words}.flatten # => ["analyzer", "analyser", "cautery", "cauterant", "drafting instrument", "extractor", "instrument of execution", "instrument of punishment", "measuring instrument", "measuring system", "measuring device", "medical instrument", "navigational instrument", "optical instrument", "plotter", "scientific instrument", "sonograph", "surveying instrument", "surveyor's instrument", "tracer", "weapon", "arm", "weapon system", "whip"]
Michael Granger <ged@FaerieMUD.org>
Copyright (c) 2003-2008, Michael Granger All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of the author/s, nor the names of the project's contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
$Id: wordnet.rb,v 2640c845eb5c 2009/11/17 16:59:25 ged $
This file contains the extensions to the Linguistics::EN module which provide support for the Ruby LinkParser module. LinkParser enables grammatic queries of English language sentences.
# Test to see whether or not the link parser is loaded. Linguistics::EN.has_link_parser? # => true # Diagram the first linkage for a test sentence puts "he is a big dog".sentence.linkages.first.to_s +---O*---+ | +--Ds--+ +Ss+ | +-A-+ | | | | | he is a big dog # Find the verb in the sentence "he is a big dog".en.sentence.verb.to_s # => "is" # Combined infinitive + LinkParser: Find the infinitive form of the verb of the given sentence. "he is a big dog".en.sentence.verb.infinitive # => "be" # Find the direct object of the sentence "he is a big dog".en.sentence.object.to_s # => "dog" # Combine WordNet + LinkParser to find the definition of the direct object of # the sentence "he is a big dog".en.sentence.object.gloss # => "a member of the genus Canis (probably descended from the common wolf) that has been domesticated by man since prehistoric times; occurs in many breeds; \"the dog barked all night\""
Martin Chase <stillflame@FaerieMUD.org>
Michael Granger <ged@FaerieMUD.org>
Copyright (c) 2003-2008, Michael Granger All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of the author/s, nor the names of the project's contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
This file contains functions for deriving the infinitive forms of conjugated English words. Requiring this file adds functions and constants to the Linguistics::EN module.
Michael Granger <ged@FaerieMUD.org>
This code was ported from the excellent 'Lingua::EN::Infinitive' Perl module by Ron Savage, which is distributed under the following license:
Australian copyright (c) 1999-2002 Ron Savage. All Programs of mine are 'OSI Certified Open Source Software'; you can redistribute them and/or modify them under the terms of The Artistic License, a copy of which is available at: http://www.opensource.org/licenses/index.html
Copyright (c) 2003-2008, Michael Granger All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of the author/s, nor the names of the project's contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Add the specified method (which can be either a Method object or a Symbol for looking up a method)
# File lib/linguistics/en.rb, line 108 def self::def_lprintf_formatter( name, meth ) meth = self.method( meth ) unless meth.is_a?( Method ) self.lprintf_formatters[ name ] = meth end
Make a function that calls the method meth on the synset of an input word.
# File lib/linguistics/en/wordnet.rb, line 121 def def_synset_function( meth ) (class << self; self; end).instance_eval do define_method( meth ) {|*args| word, pos, sense = *args raise ArgumentError, "wrong number of arguments (0 for 1)" unless word sense ||= 1 syn = synset( word.to_s, pos, sense ) return syn.nil? ? nil : syn.send( meth ) } end end
Returns true if LinkParser was loaded okay
# File lib/linguistics/en/linkparser.rb, line 75 def has_link_parser? ; @has_link_parser ; end
Returns true if WordNet was loaded okay
# File lib/linguistics/en/wordnet.rb, line 101 def has_wordnet? ; @has_wordnet; end
The instance of LinkParser used for all Linguistics LinkParser functions.
# File lib/linguistics/en/linkparser.rb, line 83 def lp_dict if @lp_error raise NotImplementedError, "LinkParser functions are not loaded: %s" % @lp_error.message end return @lp_dict ||= LinkParser::Dictionary.new( :verbosity => 0 ) end
If #has_link_parser? returns false, this can be called to fetch the exception which was raised when trying to load LinkParser.
# File lib/linguistics/en/linkparser.rb, line 79 def lp_error ; @lp_error ; end
Wrap one or more parts in a non-capturing alteration Regexp
# File lib/linguistics/en.rb, line 95 def self::matchgroup( *parts ) re = parts.flatten.join("|") "(?:#{re})" end
Return a LinkParser::Sentence for the stringified obj.
# File lib/linguistics/en/linkparser.rb, line 104 def sentence( obj ) return Linguistics::EN::lp_dict.parse( obj.to_s ) end
If #haveWordnet? returns false, this can be called to fetch the exception which was raised when WordNet was loaded.
# File lib/linguistics/en/wordnet.rb, line 105 def wn_error ; @wn_error; end
The instance of the WordNet::Lexicon used for all Linguistics WordNet functions.
# File lib/linguistics/en/wordnet.rb, line 109 def wn_lexicon if @wn_error raise NotImplementedError, "WordNet functions are not loaded: %s" % @wn_error.message end @wn_lexicon ||= WordNet::Lexicon::new end
Return the given phrase with the appropriate indefinite article ("a" or "an") prepended.
# File lib/linguistics/en.rb, line 1168 def a( phrase, count=nil ) md = /\A(\s*)(.+?)(\s*)\Z/.match( phrase.to_s ) pre, word, post = md.to_a[1,3] return phrase if word.nil? or word.empty? result = indef_article( word, count ) return pre + result + post end
Turns a camel-case string ("camelCaseToEnglish") to plain English ("camel case to english"). Each word is decapitalized.
# File lib/linguistics/en.rb, line 1601 def camel_case_to_english( string ) string.to_s. gsub( /([A-Z])([A-Z])/ ) { "#$1 #$2" }. gsub( /([a-z])([A-Z])/ ) { "#$1 #$2" }.downcase end
Return the specified obj (which must support the #collect method) as a conjunction. Each item is converted to a String if it is not already (using #to_s) unless a block is given, in which case it is called once for each object in the array, and the stringified return value from the block is used instead. Returning nil causes that particular element to be omitted from the resulting conjunction. The following options can be used to control the makeup of the returned conjunction String:
Specify one or more characters to separate items in the resulting list. Defaults to ', '.
An alternate separator to use if any of the resulting conjunction's clauses contain the :separator character/s. Defaults to '; '.
Flag that indicates whether or not to join the last clause onto the rest of the conjunction using a penultimate :separator. E.g.,
%w{duck, cow, dog}.en.conjunction # => "a duck, a cow, and a dog" %w{duck cow dog}.en.conjunction( :penultimate => false ) "a duck, a cow and a dog"
Default to true.
Sets the word used as the conjunctive (separating word) of the resulting string. Default to 'and'.
If set to true (the default), items which are indentical (after surrounding spaces are stripped) will be combined in the resulting conjunction. E.g.,
%w{goose cow goose dog}.en.conjunction # => "two geese, a cow, and a dog" %w{goose cow goose dog}.en.conjunction( :combine => false ) # => "a goose, a cow, a goose, and a dog"
If set to true (the default), then items are compared case-insensitively when combining them. This has no effect if :combine is false.
If set to true, then quantities of combined items are turned into general descriptions instead of exact amounts.
ary = %w{goose pig dog horse goose reindeer goose dog horse} ary.en.conjunction # => "three geese, two dogs, two horses, a pig, and a reindeer" ary.en.conjunction( :generalize => true ) # => "several geese, several dogs, several horses, a pig, and a reindeer"
See the #quantify method for specifics on how quantities are generalized. Generalization defaults to false, and has no effect if :combine is false.
If set to true (the default), items which are combined in the resulting conjunction will be listed in order of amount, with greater quantities sorted first. If :quantsort is false, combined items will appear where the first instance of them occurred in the list. This sort is also the fallback for indentical quantities (ie., items of the same quantity will be listed in the order they appeared in the source list).
# File lib/linguistics/en.rb, line 1490 def conjunction( obj, args={} ) config = ConjunctionDefaults.merge( args ) phrases = [] # Transform items in the obj to phrases if block_given? phrases = obj.collect {|item| yield(item) }.compact else phrases = obj.collect {|item| item.to_s } end # No need for a conjunction if there's only one thing return a(phrases[0]) if phrases.length < 2 # Set up a Proc to derive a collector key from a phrase depending on the # configuration keyfunc = if config[:casefold] proc {|key| key.downcase.strip} else proc {|key| key.strip} end # Count and delete phrases that hash the same when the keyfunc munges # them into the same thing if we're combining (:combine => true). collector = {} if config[:combine] phrases.each_index do |i| # Stop when reaching the end of a truncated list break if phrases[i].nil? # Make the key using the configured key function phrase = keyfunc[ phrases[i] ] # If the collector already has this key, increment its count, # eliminate the duplicate from the phrase list, and redo the loop. if collector.key?( phrase ) collector[ phrase ] += 1 phrases.delete_at( i ) redo end collector[ phrase ] = 1 end else # If we're not combining, just make everything have a count of 1. phrases.uniq.each {|key| collector[ keyfunc[key] ] = 1} end # If sort-by-quantity is turned on, sort the phrases first by how many # there are (most-first), and then by the order they were specified in. if config[:quantsort] && config[:combine] origorder = {} phrases.each_with_index {|phrase,i| origorder[ keyfunc[phrase] ] ||= i } phrases.sort! {|a,b| (collector[ keyfunc[b] ] <=> collector[ keyfunc[a] ]).nonzero? || (origorder[ keyfunc[a] ] <=> origorder[ keyfunc[b] ]) } end # Set up a filtering function that adds either an indefinite article, an # indefinite quantifier, or a definite quantifier to each phrase # depending on the configuration and the count of phrases in the # collector. filter = if config[:generalize] proc {|phrase, count| quantify(phrase, count) } else proc {|phrase, count| if count > 1 "%s %s" % [ # :TODO: Make this threshold settable count < 10 ? count.en.numwords : count.to_s, plural(phrase, count) ] else a( phrase ) end } end # Now use the configured filter to turn each phrase into its final # form. Hmmm... square-bracket Lisp? phrases.collect! {|phrase| filter[phrase, collector[ keyfunc[phrase] ]] } # Prepend the conjunctive to the last element unless it's empty or # there's only one element phrases[-1].insert( 0, config[:conjunctive] + " " ) unless config[:conjunctive].strip.empty? or phrases.length < 2 # Concatenate the last two elements if there's no penultimate separator, # and pick a separator based on how many phrases there are and whether # or not there's already an instance of it in the phrases. phrase_count = phrases.length phrases[-2] << " " << phrases.pop unless config[:penultimate] sep = config[:separator] if phrase_count <= 2 sep = ' ' elsif phrases.find {|str| str.include?(config[:separator]) } sep = config[:altsep] end return phrases.join( sep ) end
Turns an English language string into a CamelCase word.
# File lib/linguistics/en.rb, line 1609 def english_to_camel_case( string ) string.to_s.gsub( /\s+([a-z])/ ) { $1.upcase } end
Returns the given word with a prepended indefinite article, unless count is non-nil and not singular.
# File lib/linguistics/en.rb, line 940 def indef_article( word, count ) count ||= Linguistics::num return "#{count} #{word}" if count && /^(#{PL_count_one})$/ !~ count.to_s # Handle user-defined variants # return value if value = ud_match( word, A_a_user_defined ) case word # Handle special cases when /^(#{A_explicit_an})/ return "an #{word}" # Handle abbreviations when /^(#{A_abbrev})/ return "an #{word}" when /^[aefhilmnorsx][.-]/ return "an #{word}" when /^[a-z][.-]/ return "a #{word}" # Handle consonants when /^[^aeiouy]/ return "a #{word}" # Handle special vowel-forms when /^e[uw]/ return "a #{word}" when /^onc?e\b/ return "a #{word}" when /^uni([^nmd]|mo)/ return "a #{word}" when /^u[bcfhjkqrst][aeiou]/ return "a #{word}" # Handle vowels when /^[aeiou]/ return "an #{word}" # Handle y... (before certain consonants implies (unnaturalized) "i.." sound) when /^(#{A_y_cons})/ return "an #{word}" # Otherwise, guess "a" else return "a #{word}" end end
Return the infinitive form of the given word
# File lib/linguistics/en/infinitive.rb, line 1048 def infinitive( word ) word = word.to_s word1 = word2 = suffix = rule = newword = '' if IrregularInfinitives.key?( word ) word1 = IrregularInfinitives[ word ] rule = 'irregular' else # Build up $prefix{$suffix} as an array of prefixes, from longest to shortest. prefix, suffix = nil prefixes = Hash::new {|hsh,key| hsh[key] = []} # Build the hash of prefixes for the word 1.upto( word.length ) {|i| prefix = word[0, i] suffix = word[i..-1] (suffix.length - 1).downto( 0 ) {|j| newword = prefix + suffix[0, j] prefixes[ suffix ].push( newword ) } } $stderr.puts "prefixes: %p" % prefixes if $DEBUG # Now check for rules covering the prefixes for this word, picking # the first one if one was found. if (( suffix = ((InfSuffixRuleOrder & prefixes.keys).first) )) rule = InfSuffixRules[ suffix ][:rule] shortestPrefix = InfSuffixRules[ suffix ][:word1] $stderr.puts "Using rule %p (%p) for suffix %p" % [ rule, shortestPrefix, suffix ] if $DEBUG case shortestPrefix when 0 word1 = prefixes[ suffix ][ 0 ] word2 = prefixes[ suffix ][ 1 ] $stderr.puts "For sp = 0: word1: %p, word2: %p" % [ word1, word2 ] if $DEBUG when -1 word1 = prefixes[ suffix ].last + InfSuffixRules[ suffix ][:suffix1] word2 = '' $stderr.puts "For sp = -1: word1: %p, word2: %p" % [ word1, word2 ] if $DEBUG when -2 word1 = prefixes[ suffix ].last + InfSuffixRules[ suffix ][:suffix1] word2 = prefixes[ suffix ].last $stderr.puts "For sp = -2: word1: %p, word2: %p" % [ word1, word2 ] if $DEBUG when -3 word1 = prefixes[ suffix ].last + InfSuffixRules[ suffix ][:suffix1] word2 = prefixes[ suffix ].last + InfSuffixRules[ suffix ][:suffix2] $stderr.puts "For sp = -3: word1: %p, word2: %p" % [ word1, word2 ] if $DEBUG when -4 word1 = word word2 = '' $stderr.puts "For sp = -4: word1: %p, word2: %p" % [ word1, word2 ] if $DEBUG else raise IndexError, "Couldn't find rule for shortest prefix %p" % shortestPrefix end # Rules 12b and 15: Strip off 'ed' or 'ing'. if rule == '12b' or rule == '15' # Do we have a monosyllable of this form: # o 0+ Consonants # o 1+ Vowel # o 2 Non-wx # Eg: tipped => tipp? # Then return tip and tipp. # Eg: swimming => swimm? # Then return tipswim and swimm. if /^([^aeiou]*[aeiou]+)([^wx])\22$$/ =~ word2 word1 = $1 + $2 word2 = $1 + $2 + $2 end end end end return Infinitive::new( word1, word2, suffix, rule ) end
Return the name of the language this module is for.
# File lib/linguistics/en.rb, line 1099 def language( unused=nil ) "English" end
Format the given fmt string by replacing %-escaped sequences with the result of performing a specified operation on the corresponding argument, ala Kernel.sprintf.
%PL |
Plural. |
%A, %AN |
Prepend indefinite article. |
%NO |
Zero-quantified phrase. |
%NUMWORDS |
Convert a number into the corresponding words. |
%CONJUNCT |
Conjunction. |
# File lib/linguistics/en.rb, line 1684 def lprintf( fmt, *args ) fmt.to_s.gsub( /%([A-Z_]+)/ ) do |match| op = $1.to_s.upcase.to_sym if self.lprintf_formatters.key?( op ) arg = args.shift self.lprintf_formatters[ op ].call( arg ) else raise "no such formatter %p" % op end end end
Translate zero-quantified phrase to "no phrase.plural"
# File lib/linguistics/en.rb, line 1182 def no( phrase, count=nil ) md = /\A(\s*)(.+?)(\s*)\Z/.match( phrase.to_s ) pre, word, post = md.to_a[1,3] count ||= Linguistics::num || 0 unless /^#{PL_count_zero}$/ =~ count.to_s return "#{pre}#{count} " + plural( word, count ) + post else return "#{pre}no " + plural( word, 0 ) + post end end
Normalize a count to either 1 or 2 (singular or plural)
# File lib/linguistics/en.rb, line 669 def normalize_count( count, default=2 ) return default if count.nil? # Default to plural if /^(#{PL_count_one})$/ =~ count.to_s || Linguistics::classical? && /^(#{PL_count_zero})$/ =~ count.to_s return 1 else return default end end
Return the specified number num as an array of number phrases.
# File lib/linguistics/en.rb, line 1045 def number_to_words( num, config ) return [config[:zero]] if num.to_i.zero? chunks = [] # Break into word-groups if groups is set if config[:group].nonzero? # Build a Regexp with <config[:group]> number of digits. Any past # the first are optional. re = Regexp::new( "(\\d)" + ("(\\d)?" * (config[:group] - 1)) ) # Scan the string, and call the word-chunk function that deals with # chunks of the found number of digits. num.to_s.scan( re ) {|digits| debug_msg " digits = #{digits.inspect}" fn = NumberToWordsFunctions[ digits.nitems ] numerals = digits.flatten.compact.collect {|i| i.to_i} debug_msg " numerals = #{numerals.inspect}" chunks.push fn.call( config[:zero], *numerals ).strip } else phrase = num.to_s phrase.sub!( /\A\s*0+/, '' ) mill = 0 # Match backward from the end of the digits in the string, turning # chunks of three, of two, and of one into words. mill += 1 while phrase.sub!( /(\d)(\d)(\d)(?=\D*\Z)/ ) { words = to_hundreds( $1.to_i, $2.to_i, $3.to_i, mill, config[:and] ) chunks.unshift words.strip.squeeze(' ') unless words.nil? '' } phrase.sub!( /(\d)(\d)(?=\D*\Z)/ ) { chunks.unshift to_tens( $1.to_i, $2.to_i, mill ).strip.squeeze(' ') '' } phrase.sub!( /(\d)(?=\D*\Z)/ ) { chunks.unshift to_units( $1.to_i, mill ).strip.squeeze(' ') '' } end return chunks end
Return the specified number as english words. One or more configuration values may be passed to control the returned String:
Controls how many numbers at a time are grouped together. Valid values are 0 (normal grouping), 1 (single-digit grouping, e.g., "one, two, three, four"), 2 (double-digit grouping, e.g., "twelve, thirty-four", or 3 (triple-digit grouping, e.g., "one twenty-three, four").
Set the character/s used to separate word groups. Defaults to ", ".
Set the word and/or characters used where ' and ' (the default) is normally used. Setting :and to ' ', for example, will cause 2556 to be returned as "two-thousand, five hundred fifty-six" instead of "two-thousand, five hundred and fifty-six".
Set the word used to represent the numeral 0 in the result. 'zero' is the default.
Set the translation of any decimal points in the number; the default is 'point'.
If set to a true value, the number will be returned as an array of word groups instead of a String.
# File lib/linguistics/en.rb, line 1242 def numwords( number, hashargs={} ) num = number.to_s config = NumwordDefaults.merge( hashargs ) raise "Bad chunking option: #{config[:group]}" unless config[:group].between?( 0, 3 ) # Array of number parts: first is everything to the left of the first # decimal, followed by any groups of decimal-delimted numbers after that parts = [] # Wordify any sign prefix sign = (/\A\s*\+/ =~ num) ? 'plus' : (/\A\s*\-/ =~ num) ? 'minus' : '' # Strip any ordinal suffixes ord = true if num.sub!( /(st|nd|rd|th)\Z/, '' ) # Split the number into chunks delimited by '.' chunks = if !config[:decimal].empty? then if config[:group].nonzero? num.split(/\./) else num.split(/\./, 2) end else [ num ] end # Wordify each chunk, pushing arrays into the parts array chunks.each_with_index {|chunk,section| chunk.gsub!( /\D+/, '' ) # If there's nothing in this chunk of the number, set it to zero # unless it's the whole-number part, in which case just push an # empty array. if chunk.empty? if section.zero? parts.push [] next end end # Split the number section into wordified parts unless this is the # second or succeeding part of a non-group number unless config[:group].zero? && section.nonzero? parts.push number_to_words( chunk, config ) else parts.push number_to_words( chunk, config.merge(:group => 1) ) end } debug_msg "Parts => #{parts.inspect}" # Turn the last word of the whole-number part back into an ordinal if # the original number came in that way. if ord && !parts[0].empty? parts[0][-1] = ordinal( parts[0].last ) end # If the caller's expecting an Array return, just flatten and return the # parts array. if config[:asArray] unless sign.empty? parts[0].unshift( sign ) end return parts.flatten end # Catenate each sub-parts array into a whole number part and one or more # post-decimal parts. If grouping is turned on, all sub-parts get joined # with commas, otherwise just the whole-number part is. if config[:group].zero? if parts[0].length > 1 # Join all but the last part together with commas wholenum = parts[0][0...-1].join( config[:comma] ) # If the last part is just a single word, append it to the # wholenum part with an 'and'. This is to get things like 'three # thousand and three' instead of 'three thousand, three'. if /^\s*(\S+)\s*$/ =~ parts[0].last wholenum += config[:and] + parts[0].last else wholenum += config[:comma] + parts[0].last end else wholenum = parts[0][0] end decimals = parts[1..-1].collect {|part| part.join(" ")} debug_msg "Wholenum: #{wholenum.inspect}; decimals: #{decimals.inspect}" # Join with the configured decimal; if it's empty, just join with # spaces. unless config[:decimal].empty? return sign + ([ wholenum ] + decimals). join( " #{config[:decimal]} " ).strip else return sign + ([ wholenum ] + decimals). join( " " ).strip end else return parts.compact. separate( config[:decimal] ). delete_if {|el| el.empty?}. join( config[:comma] ). strip end end
Transform the given number into an ordinal word. The number object can be either an Integer or a String.
# File lib/linguistics/en.rb, line 1355 def ordinal( number ) case number when Integer return number.to_s + (Nth[ number % 100 ] || Nth[ number % 10 ]) else return number.to_s.sub( /(#{OrdinalSuffixes})\Z/ ) { Ordinals[$1] } end end
Transform the given number into an ordinate word.
# File lib/linguistics/en.rb, line 1368 def ordinate( number ) return Linguistics::EN.ordinal( Linguistics::EN.numwords(number) ) end
Return the plural of the given phrase if count indicates it should be plural.
# File lib/linguistics/en.rb, line 1106 def plural( phrase, count=nil ) phrase = numwords( phrase ) if phrase.is_a?( Numeric ) md = /\A(\s*)(.+?)(\s*)\Z/.match( phrase.to_s ) pre, word, post = md.to_a[1,3] return phrase if word.nil? or word.empty? plural = postprocess( word, pluralize_special_adjective(word, count) || pluralize_special_verb(word, count) || pluralize_noun(word, count) ) return pre + plural + post end
Return the plural of the given adjectival phrase if count indicates it should be plural.
# File lib/linguistics/en.rb, line 1153 def plural_adjective( phrase, count=nil ) md = /\A(\s*)(.+?)(\s*)\Z/.match( phrase.to_s ) pre, word, post = md.to_a[1,3] return phrase if word.nil? or word.empty? plural = postprocess( word, pluralize_special_adjective(word, count) || word ) return pre + plural + post end
Return the plural of the given noun phrase if count indicates it should be plural.
# File lib/linguistics/en.rb, line 1125 def plural_noun( phrase, count=nil ) md = /\A(\s*)(.+?)(\s*)\Z/.match( phrase.to_s ) pre, word, post = md.to_a[1,3] return phrase if word.nil? or word.empty? plural = postprocess( word, pluralize_noun(word, count) ) return pre + plural + post end
Return the plural of the given verb phrase if count indicates it should be plural.
# File lib/linguistics/en.rb, line 1138 def plural_verb( phrase, count=nil ) md = /\A(\s*)(.+?)(\s*)\Z/.match( phrase.to_s ) pre, word, post = md.to_a[1,3] return phrase if word.nil? or word.empty? plural = postprocess( word, pluralize_special_verb(word, count) || pluralize_general_verb(word, count) ) return pre + plural + post end
Pluralize regular verbs
# File lib/linguistics/en.rb, line 878 def pluralize_general_verb( word, count ) count ||= Linguistics::num count = normalize_count( count ) return word if /^(#{PL_count_one})$/ =~ count.to_s case word # Handle ambiguous present tenses (simple and compound) when /^(#{PL_v_ambiguous_pres})((\s.*)?)$/ return PL_v_ambiguous_pres_h[ $1.downcase ] + $2 # Handle ambiguous preterite and perfect tenses when /^(#{PL_v_ambiguous_non_pres})((\s.*)?)$/ return word # Otherwise, 1st or 2nd person is uninflected else return word end end
Pluralize nouns
# File lib/linguistics/en.rb, line 705 def pluralize_noun( word, count=nil ) value = nil count ||= Linguistics::num count = normalize_count( count ) return word if count == 1 # Handle user-defined nouns #if value = ud_match( word, PL_sb_user_defined ) # return value #end # Handle empty word, singular count and uninflected plurals case word when '' return word when /^(#{PL_sb_uninflected})$/ return word else if Linguistics::classical? && /^(#{PL_sb_uninflected_herd})$/ =~ word return word end end # Handle compounds ("Governor General", "mother-in-law", "aide-de-camp", etc.) case word when /^(?:#{PL_sb_postfix_adj})$/ value = $2 return pluralize_noun( $1, 2 ) + value when /^(?:#{PL_sb_prep_dual_compound})$/ value = [ $2, $3 ] return pluralize_noun( $1, 2 ) + value[0] + pluralize_noun( value[1] ) when /^(?:#{PL_sb_prep_compound})$/ value = $2 return pluralize_noun( $1, 2 ) + value # Handle pronouns when /^((?:#{PL_prep})\s+)(#{PL_pron_acc})$/ return $1 + PL_pron_acc_h[ $2.downcase ] when /^(#{PL_pron_nom})$/ return PL_pron_nom_h[ word.downcase ] when /^(#{PL_pron_acc})$/ return PL_pron_acc_h[ $1.downcase ] # Handle isolated irregular plurals when /(.*)\b(#{PL_sb_irregular})$/ return $1 + PL_sb_irregular_h[ $2.downcase ] when /(#{PL_sb_U_man_mans})$/ return "#{$1}s" # Handle families of irregular plurals when /(.*)man$/ ; return "#{$1}men" when /(.*[ml])ouse$/ ; return "#{$1}ice" when /(.*)goose$/ ; return "#{$1}geese" when /(.*)tooth$/ ; return "#{$1}teeth" when /(.*)foot$/ ; return "#{$1}feet" # Handle unassimilated imports when /(.*)ceps$/ ; return word when /(.*)zoon$/ ; return "#{$1}zoa" when /(.*[csx])is$/ ; return "#{$1}es" when /(#{PL_sb_U_ex_ices})ex$/; return "#{$1}ices" when /(#{PL_sb_U_ix_ices})ix$/; return "#{$1}ices" when /(#{PL_sb_U_um_a})um$/ ; return "#{$1}a" when /(#{PL_sb_U_us_i})us$/ ; return "#{$1}i" when /(#{PL_sb_U_on_a})on$/ ; return "#{$1}a" when /(#{PL_sb_U_a_ae})$/ ; return "#{$1}e" end # Handle incompletely assimilated imports if Linguistics::classical? case word when /(.*)trix$/ ; return "#{$1}trices" when /(.*)eau$/ ; return "#{$1}eaux" when /(.*)ieu$/ ; return "#{$1}ieux" when /(.{2,}[yia])nx$/ ; return "#{$1}nges" when /(#{PL_sb_C_en_ina})en$/; return "#{$1}ina" when /(#{PL_sb_C_ex_ices})ex$/; return "#{$1}ices" when /(#{PL_sb_C_ix_ices})ix$/; return "#{$1}ices" when /(#{PL_sb_C_um_a})um$/ ; return "#{$1}a" when /(#{PL_sb_C_us_i})us$/ ; return "#{$1}i" when /(#{PL_sb_C_us_us})$/ ; return "#{$1}" when /(#{PL_sb_C_a_ae})$/ ; return "#{$1}e" when /(#{PL_sb_C_a_ata})a$/ ; return "#{$1}ata" when /(#{PL_sb_C_o_i})o$/ ; return "#{$1}i" when /(#{PL_sb_C_on_a})on$/ ; return "#{$1}a" when /#{PL_sb_C_im}$/ ; return "#{word}im" when /#{PL_sb_C_i}$/ ; return "#{word}i" end end # Handle singular nouns ending in ...s or other silibants case word when /^(#{PL_sb_singular_s})$/; return "#{$1}es" when /^([A-Z].*s)$/; return "#{$1}es" when /(.*)([cs]h|[zx])$/ ; return "#{$1}#{$2}es" # when /(.*)(us)$/i ; return "#{$1}#{$2}es" # Handle ...f -> ...ves when /(.*[eao])lf$/ ; return "#{$1}lves"; when /(.*[^d])eaf$/ ; return "#{$1}eaves" when /(.*[nlw])ife$/ ; return "#{$1}ives" when /(.*)arf$/ ; return "#{$1}arves" # Handle ...y when /(.*[aeiou])y$/ ; return "#{$1}ys" when /([A-Z].*y)$/ ; return "#{$1}s" when /(.*)y$/ ; return "#{$1}ies" # Handle ...o when /#{PL_sb_U_o_os}$/ ; return "#{word}s" when /[aeiou]o$/ ; return "#{word}s" when /o$/ ; return "#{word}es" # Otherwise just add ...s else return "#{word}s" end end
Handle special adjectives
# File lib/linguistics/en.rb, line 902 def pluralize_special_adjective( word, count ) count ||= Linguistics::num count = normalize_count( count ) return word if /^(#{PL_count_one})$/ =~ count.to_s # Handle user-defined verbs #if value = ud_match( word, PL_adj_user_defined ) # return value #end case word # Handle known cases when /^(#{PL_adj_special})$/ return PL_adj_special_h[ $1.downcase ] # Handle possessives when /^(#{PL_adj_poss})$/ return PL_adj_poss_h[ $1.downcase ] when /^(.*)'s?$/ pl = plural_noun( $1 ) if /s$/ =~ pl return "#{pl}'" else return "#{pl}'s" end # Otherwise, no idea else return nil end end
Pluralize special verbs
# File lib/linguistics/en.rb, line 835 def pluralize_special_verb( word, count ) count ||= Linguistics::num count = normalize_count( count ) return nil if /^(#{PL_count_one})$/ =~ count.to_s # Handle user-defined verbs #if value = ud_match( word, PL_v_user_defined ) # return value #end case word # Handle irregular present tense (simple and compound) when /^(#{PL_v_irregular_pres})((\s.*)?)$/ return PL_v_irregular_pres_h[ $1.downcase ] + $2 # Handle irregular future, preterite and perfect tenses when /^(#{PL_v_irregular_non_pres})((\s.*)?)$/ return word # Handle special cases when /^(#{PL_v_special_s})$/, /\s/ return nil # Handle standard 3rd person (chop the ...(e)s off single words) when /^(.*)([cs]h|[x]|zz|ss)es$/ return $1 + $2 when /^(..+)ies$/ return "#{$1}y" when /^(.+)oes$/ return "#{$1}o" when /^(.*[^s])s$/ return $1 # Otherwise, a regular verb (handle elsewhere) else return nil end end
Do normal/classical switching and match capitalization in inflected by examining the original input.
# File lib/linguistics/en.rb, line 683 def postprocess( original, inflected ) inflected.sub!( /([^|]+)\|(.+)/ ) { Linguistics::classical? ? $2 : $1 } case original when "I" return inflected when /^[A-Z]+$/ return inflected.upcase when /^[A-Z]/ # Can't use #capitalize, as it will downcase the rest of the string, # too. inflected[0,1] = inflected[0,1].upcase return inflected else return inflected end end
Participles
# File lib/linguistics/en.rb, line 1197 def present_participle( word ) plural = plural_verb( word.to_s, 2 ) plural.sub!( /ie$/, 'y' ) or plural.sub!( /ue$/, 'u' ) or plural.sub!( /([auy])e$/, '$1' ) or plural.sub!( /i$/, '' ) or plural.sub!( /([^e])e$/, "\\1" ) or /er$/.match( plural ) or plural.sub!( /([^aeiou][aeiouy]([bdgmnprst]))$/, "\\1\\2" ) return "#{plural}ing" end
Returns the proper noun form of a string by capitalizing most of the words.
Examples:
English.proper_noun("bosnia and herzegovina") -> "Bosnia and Herzegovina" English.proper_noun("macedonia, the former yugoslav republic of") -> "Macedonia, the Former Yugoslav Republic of" English.proper_noun("virgin islands, u.s.") -> "Virgin Islands, U.S."
# File lib/linguistics/en.rb, line 1662 def proper_noun( string ) return string.split(/([ .]+)/).collect {|word| next word unless /^[a-z]/.match( word ) && ! (%{and the of}.include?( word )) word.capitalize }.join end
Return a phrase describing the specified number of objects in the given phrase in general terms. The following options can be used to control the makeup of the returned quantity String:
Sets the word (and any surrounding spaces) used as the word separating the quantity from the noun in the resulting string. Defaults to ' of '.
# File lib/linguistics/en.rb, line 1381 def quantify( phrase, number=0, args={} ) num = number.to_i config = QuantifyDefaults.merge( args ) case num when 0 no( phrase ) when 1 a( phrase ) when SeveralRange "several " + plural( phrase, num ) when NumberRange "a number of " + plural( phrase, num ) when NumerousRange "numerous " + plural( phrase, num ) when ManyRange "many " + plural( phrase, num ) else # Anything bigger than the ManyRange gets described like # "hundreds of thousands of..." or "millions of..." # depending, of course, on how many there are. thousands, subthousands = Math::log10( num ).to_i.divmod( 3 ) stword = case subthousands when 2 "hundreds" when 1 "tens" else nil end thword = plural( to_thousands(thousands).strip ) thword = nil if thword.empty? [ # Hundreds (of)... stword, # thousands (of) thword, # stars. plural(phrase, number) ].compact.join( config[:joinword] ) end end
Look up the synset associated with the given word or collocation in the WordNet lexicon and return a WordNet::Synset object.
# File lib/linguistics/en/wordnet.rb, line 148 def synset( word, pos=nil, sense=1 ) lex = Linguistics::EN::wn_lexicon if pos.is_a?( Fixnum ) sense = pos pos = nil end postries = pos ? [pos] : [:noun, :verb, :adjective, :adverb, :other] syn = nil postries.each do |pos| break if syn = lex.lookup_synsets( word.to_s, pos, sense ) end return syn end
Look up all the synsets associated with the given word or collocation in the WordNet lexicon and return an Array of WordNet::Synset objects. If pos is nil, return synsets for all parts of speech.
# File lib/linguistics/en/wordnet.rb, line 168 def synsets( word, pos=nil ) lex = Linguistics::EN::wn_lexicon postries = pos ? [pos] : [:noun, :verb, :adjective, :adverb, :other] syns = [] postries.each {|pos| syns << lex.lookup_synsets( word.to_s, pos ) } return syns.flatten.compact end
Transform the specified number of hundreds-, tens-, and units-place numerals into a word phrase. If the number of thousands (thousands) is greater than 0, it will be used to determine where the decimal point is in relation to the hundreds-place number.
# File lib/linguistics/en.rb, line 1014 def to_hundreds( hundreds, tens=0, units=0, thousands=0, joinword=" and " ) joinword = ' ' if joinword.empty? if hundreds.nonzero? return to_units( hundreds ) + " hundred" + (tens.nonzero? || units.nonzero? ? joinword : '') + to_tens( tens, units ) + to_thousands( thousands ) elsif tens.nonzero? || units.nonzero? return to_tens( tens, units ) + to_thousands( thousands ) else return nil end end
Transform the specified number of tens- and units-place numerals into a word-phrase at the given number of thousands places.
# File lib/linguistics/en.rb, line 1000 def to_tens( tens, units, thousands=0 ) unless tens == 1 return Tens[ tens ] + ( tens.nonzero? && units.nonzero? ? '-' : '' ) + to_units( units, thousands ) else return Teens[ units ] + to_thousands( thousands ) end end
Transform the specified number into one or more words like 'thousand', 'million', etc. Uses the thousands (American) system.
# File lib/linguistics/en.rb, line 1030 def to_thousands( thousands=0 ) parts = [] (0..thousands).step( Thousands.length - 1 ) {|i| if i.zero? parts.push Thousands[ thousands % (Thousands.length - 1) ] else parts.push Thousands.last end } return parts.join(" ") end
Generated with the Darkfish Rdoc Generator 2.