|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.biojava.utils.Unchangeable
org.biojava.bio.symbol.SoftMaskedAlphabet
public final class SoftMaskedAlphabet
Soft masking is usually displayed by making the masked regions somehow different from the non masked regions. Typically the masked regions are lower case but other schemes could be invented. For example a softmasked DNA sequence may look like this:
>DNA_sequence ATGGACGCTAGCATggtggtggtggtggtggtggtGCATAGCGAGCAAGTGGAGCGTWhere the lowercase regions are masked by low complexity.
SoftMaskedAlphabet
s come with SymbolTokenizers
that understand how to read and write the softmasking. The interpretation
of what constitutes a masked region is governed by an implementation of
a MaskingDetector
. The DEFAULT
field of the
MaskingDetector
interface defines lower case tokens as masked.
Copyright (c) 2004 Novartis Institute for Tropical Diseases
Nested Class Summary | |
---|---|
class |
SoftMaskedAlphabet.CaseSensitiveTokenization
This SymbolTokenizer works with a delegate to softmask
symbol tokenization as appropriate. |
static interface |
SoftMaskedAlphabet.MaskingDetector
Implementations will define how soft masking looks. |
Nested classes/interfaces inherited from interface org.biojava.bio.Annotatable |
---|
Annotatable.AnnotationForwarder |
Field Summary |
---|
Fields inherited from interface org.biojava.bio.symbol.Alphabet |
---|
EMPTY_ALPHABET, PARSERS, SYMBOLS |
Fields inherited from interface org.biojava.bio.Annotatable |
---|
ANNOTATION |
Method Summary | |
---|---|
void |
addSymbol(Symbol s)
SoftMaskedAlphabet s cannot add new Symbol s. |
boolean |
contains(Symbol s)
Returns whether or not this Alphabet contains the symbol. |
List |
getAlphabets()
Gets the components of the Alphabet . |
Symbol |
getAmbiguity(Set s)
This is not supported. |
Annotation |
getAnnotation()
The SoftMaskedAlphabet has no annotation |
protected FiniteAlphabet |
getDelegate()
The compound alpha that holds the symbols used by this wrapper |
Symbol |
getGapSymbol()
Get the 'gap' ambiguity symbol that is most appropriate for this alphabet. |
static SoftMaskedAlphabet |
getInstance(FiniteAlphabet alphaToMask)
Generates a soft masked Alphabet where lowercase tokens are assumed to be soft masked. |
static SoftMaskedAlphabet |
getInstance(FiniteAlphabet alphaToMask,
SoftMaskedAlphabet.MaskingDetector maskingDetector)
Creates a compound alphabet that is a hybrid of the alphabet that is to be soft masked and a binary alphabet that indicates if any Symbol is soft masked or not. |
FiniteAlphabet |
getMaskedAlphabet()
Gets the Alphabet upon which masking is being applied |
SoftMaskedAlphabet.MaskingDetector |
getMaskingDetector()
Getter for the MaskingDetector |
String |
getName()
The name of the Alphabet |
Symbol |
getSymbol(List l)
Gets the compound symbol composed of the Symbols in the List. |
SymbolTokenization |
getTokenization(String type)
Get a SymbolTokenization by name. |
boolean |
isMasked(BasisSymbol s)
Determines if a Symbol is masked. |
Iterator |
iterator()
Retrieve an Iterator over the AtomicSymbols in this FiniteAlphabet. |
void |
removeSymbol(Symbol s)
SoftMaskedAlphabet s cannot remove Symbol s. |
int |
size()
The number of symbols in the alphabet. |
void |
validate(Symbol s)
Throws a precanned IllegalSymbolException if the symbol is not contained within this Alphabet. |
Methods inherited from class org.biojava.utils.Unchangeable |
---|
addChangeListener, addChangeListener, addForwarder, getForwarders, getListeners, isUnchanging, removeChangeListener, removeChangeListener, removeForwarder |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Methods inherited from interface org.biojava.utils.Changeable |
---|
addChangeListener, addChangeListener, isUnchanging, removeChangeListener, removeChangeListener |
Method Detail |
---|
public static SoftMaskedAlphabet getInstance(FiniteAlphabet alphaToMask) throws IllegalAlphabetException
alphaToMask
- for example the DNA alphabet.
SoftMaskedAlphabet
.
IllegalAlphabetException
- if it cannot be constructedpublic static SoftMaskedAlphabet getInstance(FiniteAlphabet alphaToMask, SoftMaskedAlphabet.MaskingDetector maskingDetector) throws IllegalAlphabetException
Symbol
is soft masked or not.
alphaToMask
- for example the DNA alphabet.maskingDetector
- to define masking behaivour
SoftMaskedAlphabet
.
IllegalAlphabetException
- if it cannot be constructedpublic FiniteAlphabet getMaskedAlphabet()
Alphabet
upon which masking is being applied
FiniteAlphabet
protected FiniteAlphabet getDelegate()
FiniteAlphabet
public Annotation getAnnotation()
getAnnotation
in interface Annotatable
public String getName()
getName
in interface Alphabet
String
in the form of
"Softmasked {"+alphaToMask.getName()+"}"
public List getAlphabets()
Alphabet
.
getAlphabets
in interface Alphabet
List
with two members, the first is the wrapped
Alphabet
the second is the binary
SubIntegerAlphabet
.public Symbol getSymbol(List l) throws IllegalSymbolException
Symbols
in the List.
The Symbols
in the List
must be from alpha
(defined in the constructor) and SUBINTEGER[0..1]
getSymbol
in interface Alphabet
l
- a List
of Symbols
Symbol
from this alphabet.
IllegalSymbolException
- if l
is not as expected (see above)public Symbol getAmbiguity(Set s) throws UnsupportedOperationException
getSymbol(List l)
instead and provide
it with an ambigutiy and a masking symbol.
getAmbiguity
in interface Alphabet
s
- a Set
of Symbols
UnsupportedOperationException
getSymbol(List l)
public Symbol getGapSymbol()
Alphabet
Get the 'gap' ambiguity symbol that is most appropriate for this alphabet.
In general, this will be a BasisSymbol that represents a list of AlphabetManager.getGapSymbol() the same length as the getAlphabets list.
getGapSymbol
in interface Alphabet
public boolean contains(Symbol s)
Alphabet
Returns whether or not this Alphabet contains the symbol.
An alphabet contains an ambiguity symbol iff the ambiguity symbol's getMatches() returns an alphabet that is a proper sub-set of this alphabet. That means that every one of the symbols that could match the ambiguity symbol is also a member of this alphabet.
contains
in interface Alphabet
s
- the Symbol to check
public void validate(Symbol s) throws IllegalSymbolException
Alphabet
Throws a precanned IllegalSymbolException if the symbol is not contained within this Alphabet.
This function is used all over the code to validate symbols as they enter a method. Also, the code is littered with catches for IllegalSymbolException. There is a preferred style of handling this, which should be covererd in the package documentation.
validate
in interface Alphabet
s
- the Symbol to validate
IllegalSymbolException
- if r is not contained in this alphabetpublic SoftMaskedAlphabet.MaskingDetector getMaskingDetector()
MaskingDetector
- Returns:
- the
MaskingDetector
public SymbolTokenization getTokenization(String type) throws BioException
Alphabet
Get a SymbolTokenization by name.
The parser returned is guaranteed to return Symbols and SymbolLists that conform to this alphabet.
Every alphabet should have a SymbolTokenzation under the name 'token' that uses the symbol token characters to translate a string into a SymbolList. Likewise, there should be a SymbolTokenization under the name 'name' that uses symbol names to identify symbols. Any other names may also be defined, but the behaviour of the returned SymbolTokenization is not defined here.
A SymbolTokenization under the name 'default' should be defined for all sequences, that determines the behavior when printing out a sequence. Standard behavior is to define the 'token' SymbolTokenization as default if it exists, else to define the 'name' SymbolTokenization as the default, but others are possible.
getTokenization
in interface Alphabet
type
- the name of the parser
BioException
- if for any reason the tokenization could not be builtpublic int size()
FiniteAlphabet
size
in interface FiniteAlphabet
public Iterator iterator()
FiniteAlphabet
Each AtomicSymbol as for which this.contains(as) is true will be returned exactly once by this iterator in no specified order.
iterator
in interface FiniteAlphabet
public void addSymbol(Symbol s) throws ChangeVetoException
SoftMaskedAlphabet
s cannot add new Symbol
s. A
ChangeVetoException
will be thrown.
addSymbol
in interface FiniteAlphabet
s
- the Symbol
to add.
ChangeVetoException
- when called.public void removeSymbol(Symbol s) throws ChangeVetoException
SoftMaskedAlphabet
s cannot remove Symbol
s. A
ChangeVetoException
will be thrown.
removeSymbol
in interface FiniteAlphabet
s
- the Symbol
to remove.
ChangeVetoException
- when called.public boolean isMasked(BasisSymbol s) throws IllegalSymbolException
Symbol
is masked.
s
- the Symbol
to test.
s
is masked.
IllegalSymbolException
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |