org.biojava.bio.symbol
Class UkkonenSuffixTree

java.lang.Object
  extended by org.biojava.bio.symbol.UkkonenSuffixTree

public class UkkonenSuffixTree
extends Object

A suffix tree is an efficient method for encoding the frequencies of motifs in a sequence. They are sometimes used to quickly screen for similar sequences. For instance, all motifs of length up to 2 in the sequence AAGT could be encoded as:

 root(4)
 |
 A(2)--------G(1)-----T(1)
 |           |
 A(1)--G(1)  T(1)
 

It supports addition of elements both as String and SymbolList. They should not be mixed together. The strings are also terminated internally, so it is possible to go and add more than one string to the suffix tree.

Some more work need to be done on how data should be generated from this class. If you need something that's not in there, please e-mail the list at biojava-dev@biojava.org and I'll add it in there. <\p>

Version:
1.3
Author:
Francois Pepin

Nested Class Summary
static class UkkonenSuffixTree.SuffixNode
          end Tree modification methods
 
Field Summary
static char DEFAULT_TERM_CHAR
           
static int TO_A_LEAF
           
 
Constructor Summary
UkkonenSuffixTree()
          Initializes a new UkkonenSuffixTree instance.
UkkonenSuffixTree(FiniteAlphabet alpha)
           
UkkonenSuffixTree(String seqs)
           
 
Method Summary
 void addSequence(String seq, String name, boolean doNotTerminate)
          Add a sequence into the tree.
 void addSymbolList(SymbolList list, String name, boolean doNotTerminate)
           
protected  ArrayList getAllNodes(UkkonenSuffixTree.SuffixNode root, ArrayList list, boolean leavesOnly)
           
protected  CharSequence getEdgeLabel(UkkonenSuffixTree.SuffixNode child)
           
protected  int getEdgeLength(UkkonenSuffixTree.SuffixNode child)
          Tree navigation methods
protected  CharSequence getLabel(UkkonenSuffixTree.SuffixNode node)
           
protected  int getPathEnd(UkkonenSuffixTree.SuffixNode node)
           
protected  int getPathLength(UkkonenSuffixTree.SuffixNode node)
           
 UkkonenSuffixTree.SuffixNode getRoot()
           
 UkkonenSuffixTree.SuffixNode jumpTo(UkkonenSuffixTree.SuffixNode starting, CharSequence source, int from, int to)
          Just like walkTo, but faster when used during tree construction, as it assumes that a mismatch can only occurs with the last character of the target string.
 void printTree()
           
 boolean subStringExists(String str)
           
 UkkonenSuffixTree.SuffixNode walkTo(UkkonenSuffixTree.SuffixNode starting, String source, int from, int to)
          This method is used to walk down the tree, from a given node.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DEFAULT_TERM_CHAR

public static final char DEFAULT_TERM_CHAR
See Also:
Constant Field Values

TO_A_LEAF

public static final int TO_A_LEAF
See Also:
Constant Field Values
Constructor Detail

UkkonenSuffixTree

public UkkonenSuffixTree()
Initializes a new UkkonenSuffixTree instance.


UkkonenSuffixTree

public UkkonenSuffixTree(String seqs)

UkkonenSuffixTree

public UkkonenSuffixTree(FiniteAlphabet alpha)
Method Detail

addSymbolList

public void addSymbolList(SymbolList list,
                          String name,
                          boolean doNotTerminate)
                   throws IllegalSymbolException
Throws:
IllegalSymbolException

addSequence

public void addSequence(String seq,
                        String name,
                        boolean doNotTerminate)
Add a sequence into the tree. If there are more sequences, they should be separated by a terminationChar ($ by default). If none exist, it is assumed that there is only 1 continuous sequence to be added.

Parameters:
seq - the sequence/sequences to be added into the tree.
doNotTerminate - whether we should terminate the sequence if it's non-terminated.

walkTo

public UkkonenSuffixTree.SuffixNode walkTo(UkkonenSuffixTree.SuffixNode starting,
                                           String source,
                                           int from,
                                           int to)
This method is used to walk down the tree, from a given node. The rule variable can be used to check where the walk stopped. Note that rule 3 means that the string used to walk down the tree does not match (which is a bit different from the construction where rule 3 implies that only the last character doesn't match.

The String is encoded as a substring of a given source. This is done to avoid replicating the string. To send walk down the string x from the root, one would call walkTo(root,x,0,x.length()).

Parameters:
starting - the root of the subtree we're walking down form.
source - a superstring that contains the string we're using to walking down. source.subtring(from,to) should give the string we're walking down from.
from - the start position (inclusive) of the target string in the source.
to - the end position (exclusive) of the target string in the node.
Returns:
a SuffixNode that the walk stopped at. If the walk stopped inside an edge. (check the rule variable to see where it stopped).

jumpTo

public UkkonenSuffixTree.SuffixNode jumpTo(UkkonenSuffixTree.SuffixNode starting,
                                           CharSequence source,
                                           int from,
                                           int to)
Just like walkTo, but faster when used during tree construction, as it assumes that a mismatch can only occurs with the last character of the target string.

Parameters:
starting - the root of the subtree we're walking down form.
source - a superstring that contains the string we're using to walking down. source.subtring(from,to) should give the string we're walking down from.
from - the start position (inclusive) of the target string in the source.
to - the end position (exclusive) of the target string in the node.
Returns:
a SuffixNode that the walk stopped at. If the walk stopped inside an edge. (check the rule variable to see where it stopped).

getEdgeLength

protected int getEdgeLength(UkkonenSuffixTree.SuffixNode child)
Tree navigation methods


getEdgeLabel

protected CharSequence getEdgeLabel(UkkonenSuffixTree.SuffixNode child)

getPathLength

protected int getPathLength(UkkonenSuffixTree.SuffixNode node)

getPathEnd

protected int getPathEnd(UkkonenSuffixTree.SuffixNode node)

getLabel

protected CharSequence getLabel(UkkonenSuffixTree.SuffixNode node)

getAllNodes

protected ArrayList getAllNodes(UkkonenSuffixTree.SuffixNode root,
                                ArrayList list,
                                boolean leavesOnly)

printTree

public void printTree()

getRoot

public UkkonenSuffixTree.SuffixNode getRoot()

subStringExists

public boolean subStringExists(String str)