Package org.apache.solr.analysis

Interface Summary
CharFilterFactory  
TokenFilterFactory A TokenFilterFactory creates a TokenFilter to transform one TokenStream into another.
TokenizerFactory A TokenizerFactory breaks up a stream of characters into tokens.
 

Class Summary
ArabicLetterTokenizerFactory  
ArabicNormalizationFilterFactory  
ArabicStemFilterFactory  
ASCIIFoldingFilterFactory  
BaseCharFilterFactory  
BaseTokenFilterFactory Simple abstract implementation that handles init arg processing.
BaseTokenizerFactory Simple abstract implementation that handles init arg processing.
BrazilianStemFilterFactory  
BufferedTokenStream Handles input and output buffering of TokenStream
CapitalizationFilterFactory A filter to apply normal capitalization rules to Tokens.
ChineseFilterFactory  
ChineseTokenizerFactory  
CJKTokenizerFactory  
CommonGramsFilter Construct bigrams for frequently occurring terms while indexing.
CommonGramsFilterFactory Constructs a CommonGramsFilter
CommonGramsQueryFilter Wrap a CommonGramsFilter optimizing phrase queries by only returning single words when they are not a member of a bigram.
CommonGramsQueryFilterFactory Construct CommonGramsQueryFilter This is pretty close to a straight copy from StopFilterFactory
DelimitedPayloadTokenFilterFactory  
DictionaryCompoundWordTokenFilterFactory  
DoubleMetaphoneFilter  
DoubleMetaphoneFilterFactory  
DutchStemFilterFactory  
EdgeNGramFilterFactory Creates new instances of EdgeNGramTokenFilter.
EdgeNGramTokenizerFactory Creates new instances of EdgeNGramTokenizer.
ElisionFilterFactory  
EnglishPorterFilterFactory Deprecated. Use SnowballPorterFilterFactory with language="English" instead
FrenchStemFilterFactory  
GermanStemFilterFactory  
GreekLowerCaseFilterFactory  
HTMLStripCharFilter A CharFilter that wraps another Reader and attempts to strip out HTML constructs.
HTMLStripCharFilterFactory  
HTMLStripReader Deprecated. Use HTMLStripCharFilter
HTMLStripStandardTokenizerFactory Deprecated. Use HTMLStripCharFilterFactory and StandardTokenizerFactory
HTMLStripWhitespaceTokenizerFactory Deprecated. Use HTMLStripCharFilterFactory and WhitespaceTokenizerFactory
HyphenatedWordsFilter When the plain text is extracted from documents, we will often have many words hyphenated and broken into two lines.
HyphenatedWordsFilterFactory Factory for HyphenatedWordsFilter
ISOLatin1AccentFilterFactory Factory for ISOLatin1AccentFilter $Id: ISOLatin1AccentFilterFactory.java 591158 2007-11-01 22:37:42Z hossman $
KeepWordFilter A TokenFilter that only keeps tokens with text contained in the required words.
KeepWordFilterFactory  
KeywordTokenizerFactory  
LengthFilter Deprecated. use LengthFilter
LengthFilterFactory  
LetterTokenizerFactory  
LowerCaseFilterFactory  
LowerCaseTokenizerFactory  
MappingCharFilterFactory  
NGramFilterFactory Creates new instances of NGramTokenFilter.
NGramTokenizerFactory Creates new instances of NGramTokenizer.
NumericPayloadTokenFilterFactory  
PatternReplaceFilter A TokenFilter which applies a Pattern to each token in the stream, replacing match occurances with the specified replacement string.
PatternReplaceFilterFactory  
PatternTokenizer This tokenizer uses regex pattern matching to construct distinct tokens for the input stream.
PatternTokenizerFactory This tokenizer uses regex pattern matching to construct distinct tokens for the input stream.
PersianNormalizationFilterFactory  
PhoneticFilter Create tokens for phonetic matches.
PhoneticFilterFactory Create tokens based on phonetic encoders http://jakarta.apache.org/commons/codec/api-release/org/apache/commons/codec/language/package-summary.html This takes two arguments: "encoder" required, one of "DoubleMetaphone", "Metaphone", "Soundex", "RefinedSoundex" "inject" (default=true) add tokens to the stream with the offset=0
PorterStemFilterFactory  
PositionFilterFactory Set the positionIncrement of all tokens to the "positionIncrement", except the first return token which retains its original positionIncrement value.
RemoveDuplicatesTokenFilter A TokenFilter which filters out Tokens at the same position and Term text as the previous token in the stream.
RemoveDuplicatesTokenFilterFactory  
ReversedWildcardFilter This class produces a special form of reversed tokens, suitable for better handling of leading wildcards.
ReversedWildcardFilterFactory Factory for ReversedWildcardFilter-s.
ReverseStringFilterFactory A FilterFactory which reverses the input.
RussianCommon Deprecated.
RussianLetterTokenizerFactory  
RussianLowerCaseFilterFactory  
RussianStemFilterFactory  
ShingleFilterFactory  
SnowballPorterFilterFactory Factory for SnowballFilters, with configurable language Browsing the code, SnowballFilter uses reflection to adapt to Lucene...
SolrAnalyzer  
SolrAnalyzer.TokenStreamInfo  
StandardFilterFactory  
StandardTokenizerFactory  
StopFilterFactory  
SynonymFilter SynonymFilter handles multi-token synonyms with variable position increment offsets.
SynonymFilterFactory  
SynonymMap Mapping rules for use with SynonymFilter
ThaiWordFilterFactory  
TokenizerChain  
TokenOffsetPayloadTokenFilterFactory  
TrieTokenizerFactory Tokenizer for trie fields.
TrimFilter Trims leading and trailing whitespace from Tokens in the stream.
TrimFilterFactory  
TypeAsPayloadTokenFilterFactory  
WhitespaceTokenizerFactory  
WordDelimiterFilterFactory  
 



Copyright © 2009 Apache Software Foundation. All Rights Reserved.