class Ferret::Analysis::AsciiLetterTokenizer
Summary¶ ↑
A LetterTokenizer is a tokenizer that divides text at non-ASCII letters. That is to say, it defines tokens as maximal strings of adjacent letters, as defined by the regular expression _/+/_.
Example¶ ↑
"Dave's résumé, at http://www.davebalmain.com/ 1234" => ["Dave", "s", "r", "sum", "at", "http", "www", "davebalmain", "com"]
Public Class Methods
new() → tokenizer
click to toggle source
Create a new AsciiLetterTokenizer
static VALUE frb_a_letter_tokenizer_init(VALUE self, VALUE rstr) { return get_wrapped_ts(self, rstr, letter_tokenizer_new()); }