class Bio::Nexus
DESCRIPTION¶ ↑
Bio::Nexus is a parser for nexus formatted data. It contains classes and constants enabling the representation and processing of nexus data.
USAGE¶ ↑
# Parsing a nexus formatted string str: nexus = Bio::Nexus.new( nexus_str ) # Obtaining of the nexus blocks as array of GenericBlock or # any of its subclasses (such as DistancesBlock): blocks = nexus.get_blocks # Getting a block by name: my_blocks = nexus.get_blocks_by_name( "my_block" ) # Getting distance blocks: distances_blocks = nexus.get_distances_blocks # Getting trees blocks: trees_blocks = nexus.get_trees_blocks # Getting data blocks: data_blocks = nexus.get_data_blocks # Getting characters blocks: character_blocks = nexus.get_characters_blocks # Getting taxa blocks: taxa_blocks = nexus.get_taxa_blocks
Constants
- BEGIN_BLOCK
- BEGIN_COMMENT
- BEGIN_NEXUS
- CHARACTERS
- CHARACTERS_BLOCK
- DATA
- DATATYPE
- DATA_BLOCK
- DELIMITER
- DIMENSIONS
- DISTANCES
- DISTANCES_BLOCK
- DOUBLE_QUOTE
- END_BLOCK
- END_COMMENT
- END_OF_LINE
- FORMAT
- INDENTENTION
- MATRIX
- NCHAR
- NTAX
- SINGLE_QUOTE
- TAXA
- TAXA_BLOCK
- TAXLABELS
- TREES
- TREES_BLOCK
Public Class Methods
Creates a new nexus parser for 'nexus_str'.
Arguments:
-
(required) nexus_str: String - nexus formatted data
# File lib/bio/db/nexus.rb, line 177 def initialize( nexus_str ) @blocks = Array.new @current_cmd = nil @current_subcmd = nil @current_block_name = nil @current_block = nil parse( nexus_str ) end
Public Instance Methods
A convenience methods which returns an array of all nexus blocks for which the name equals 'name' found in the String 'nexus_str' set via ::new( nexus_str ).
Arguments:
-
(required) name: String
- Returns
-
Array of GenericBlocks or any of its subclasses
# File lib/bio/db/nexus.rb, line 204 def get_blocks_by_name( name ) found_blocks = Array.new @blocks.each do | block | if ( name == block.get_name ) found_blocks.push( block ) end end found_blocks end
A convenience methods which returns an array of all characters blocks.
- Returns
-
Array of CharactersBlocks
# File lib/bio/db/nexus.rb, line 228 def get_characters_blocks get_blocks_by_name( CHARACTERS_BLOCK.chomp( ";").downcase ) end
A convenience methods which returns an array of all data blocks.
- Returns
-
Array of DataBlocks
# File lib/bio/db/nexus.rb, line 219 def get_data_blocks get_blocks_by_name( DATA_BLOCK.chomp( ";").downcase ) end
A convenience methods which returns an array of all distances blocks.
- Returns
-
Array of DistancesBlock
# File lib/bio/db/nexus.rb, line 246 def get_distances_blocks get_blocks_by_name( DISTANCES_BLOCK.chomp( ";").downcase ) end
A convenience methods which returns an array of all taxa blocks.
- Returns
-
Array of TaxaBlocks
# File lib/bio/db/nexus.rb, line 255 def get_taxa_blocks get_blocks_by_name( TAXA_BLOCK.chomp( ";").downcase ) end
A convenience methods which returns an array of all trees blocks.
- Returns
-
Array of TreesBlocks
# File lib/bio/db/nexus.rb, line 237 def get_trees_blocks get_blocks_by_name( TREES_BLOCK.chomp( ";").downcase ) end
Returns a String listing how many of each blocks it parsed.
- Returns
# File lib/bio/db/nexus.rb, line 263 def to_s str = String.new if get_blocks.length < 1 str << "empty" else str << "number of blocks: " << get_blocks.length.to_s if get_characters_blocks.length > 0 str << " [characters blocks: " << get_characters_blocks.length.to_s << "] " end if get_data_blocks.length > 0 str << " [data blocks: " << get_data_blocks.length.to_s << "] " end if get_distances_blocks.length > 0 str << " [distances blocks: " << get_distances_blocks.length.to_s << "] " end if get_taxa_blocks.length > 0 str << " [taxa blocks: " << get_taxa_blocks.length.to_s << "] " end if get_trees_blocks.length > 0 str << " [trees blocks: " << get_trees_blocks.length.to_s << "] " end end str end
Private Instance Methods
Helper method for make_matrix.
Arguments:
-
(required) token: String
-
(required) scan_token: true or false - add whole token
or scan into chars
-
(required) matrix: NexusMatrix - the matrix to which to add token
-
(required) row: Integer - the row for matrix
-
(required) col: Integer - the starting row
- Returns
-
Integer - ending row
# File lib/bio/db/nexus.rb, line 686 def add_token_to_matrix( token, scan_token, matrix, row, col ) if ( scan_token ) token.scan(/./) { |w| col += 1 matrix.set_value( row, col, w ) } else col += 1 matrix.set_value( row, col, token ) end col end
Operations required when beginnig of block encountered.
# File lib/bio/db/nexus.rb, line 341 def begin_block() if @current_block_name != nil raise NexusParseError, "Cannot have nested nexus blocks (\"end;\" might be missing)" end reset_command_state() end
Returns true if @current_cmd == command and @current_subcmd == subcommand, false otherwise
Arguments:
- Returns
-
true or false
# File lib/bio/db/nexus.rb, line 736 def cmds_equal_to?( command, subcommand ) return ( @current_cmd == command && @current_subcmd == subcommand ) end
Creates GenericBlock (or any of its subclasses) the type of which is determined by the state of @current_block_name.
- Returns
-
GenericBlock (or any of its subclasses) object
# File lib/bio/db/nexus.rb, line 395 def create_block() case @current_block_name when TAXA_BLOCK.downcase return Bio::Nexus::TaxaBlock.new( @current_block_name ) when CHARACTERS_BLOCK.downcase return Bio::Nexus::CharactersBlock.new( @current_block_name ) when DATA_BLOCK.downcase return Bio::Nexus::DataBlock.new( @current_block_name ) when DISTANCES_BLOCK.downcase return Bio::Nexus::DistancesBlock.new( @current_block_name ) when TREES_BLOCK.downcase return Bio::Nexus::TreesBlock.new( @current_block_name ) else return Bio::Nexus::GenericBlock.new( @current_block_name ) end end
Operations required when ending of block encountered.
# File lib/bio/db/nexus.rb, line 351 def end_block() if @current_block_name == nil raise NexusParseError, "Cannot have two or more \"end;\" tokens in sequence" end @current_block_name = nil end
Returns true if Strings str1 and str2 are equal - ignoring case.
Arguments:
- Returns
-
true or false
# File lib/bio/db/nexus.rb, line 721 def equal?( str1, str2 ) if ( str1 == nil || str2 == nil ) return false else return ( str1.downcase == str2.downcase ) end end
Makes a NexusMatrix out of token from token Array ary Used by process_token_for_X_block methods which contain data in a matrix form. Column 0 contains names. This will shift tokens from ary.
Arguments:
-
(required) token: String
-
(required) ary: Array
-
(required) size: Integer
-
(optional) scan_token: true or false
- Returns
# File lib/bio/db/nexus.rb, line 647 def make_matrix( token, ary, size, scan_token = false ) matrix = NexusMatrix.new col = -1 row = 0 done = false while ( !done ) if ( col == -1 ) # name col = 0 matrix.set_value( row, col, token ) # name is in col 0 else # values col = add_token_to_matrix( token, scan_token, matrix, row, col ) if ( col == size.to_i ) col = -1 row += 1 end end token = ary.shift if ( token.index( DELIMITER ) != nil ) col = add_token_to_matrix( token.chomp( ";" ), scan_token, matrix, row, col ) done = true end end # while matrix end
The master method for parsing. Stores the resulting block in array @blocks.
Arguments:
# File lib/bio/db/nexus.rb, line 297 def parse( str ) str = str.chop if str[-1..-1] == ';' ary = str.split(/[\s+=]/) ary.collect! { |x| x.strip!; x.empty? ? nil : x } ary.compact! in_comment = false comment_level = 0 # Main loop while token = ary.shift # Quotes: if ( token.index( SINGLE_QUOTE ) == 0 || token.index( DOUBLE_QUOTE ) == 0 ) token << "_" << ary.shift token = token.chop if token[-1..-1] == ';' token = token.slice( 1, token.length - 2 ) end # Comments: open = token.count( BEGIN_COMMENT ) close = token.count( END_COMMENT ) comment = comment_level > 0 comment_level = comment_level + open - close if ( open > 0 && open == close ) next elsif comment_level > 0 || comment next elsif equal?( token, END_BLOCK ) end_block() elsif equal?( token, BEGIN_BLOCK ) begin_block() @current_block_name = token = ary.shift @current_block_name.downcase! @current_block = create_block() @blocks.push( @current_block ) elsif ( @current_block_name != nil ) process_token( token.chomp( DELIMITER ), ary ) end end # main loop @blocks.compact! end
This calls various process_token_for_<name>_block methods depeding on state of @current_block_name.
Arguments:
-
(required) token: String
-
(required) ary: Array
# File lib/bio/db/nexus.rb, line 365 def process_token( token, ary ) case @current_block_name when TAXA_BLOCK.downcase process_token_for_taxa_block( token ) when CHARACTERS_BLOCK.downcase process_token_for_character_block( token, ary ) when DATA_BLOCK.downcase process_token_for_data_block( token, ary ) when DISTANCES_BLOCK.downcase process_token_for_distances_block( token, ary ) when TREES_BLOCK.downcase process_token_for_trees_block( token, ary ) else process_token_for_generic_block( token ) end end
This processes the tokens (between Begin Taxa; and End;) for a character block Example of a currently parseable character block: Begin Characters; Dimensions NChar=20
NTax=4;
Format DataType=DNA Missing=x Gap=- MatchChar=.; Matrix fish ACATA GAGGG TACCT CTAAG frog ACTTA GAGGC TACCT CTAGC snake ACTCA CTGGG TACCT TTGCG mouse ACTCA GACGG TACCT TTGCG; End;
Arguments:
-
(required) token: String
-
(required) ary: Array
# File lib/bio/db/nexus.rb, line 458 def process_token_for_character_block( token, ary ) if ( equal?( token, DIMENSIONS ) ) @current_cmd = DIMENSIONS @current_subcmd = nil elsif ( equal?( token, FORMAT ) ) @current_cmd = FORMAT @current_subcmd = nil elsif ( equal?( token, MATRIX ) ) @current_cmd = MATRIX @current_subcmd = nil elsif ( @current_cmd == DIMENSIONS && equal?( token, NTAX ) ) @current_subcmd = NTAX elsif ( @current_cmd == DIMENSIONS && equal?( token, NCHAR ) ) @current_subcmd = NCHAR elsif ( @current_cmd == FORMAT && equal?( token, DATATYPE ) ) @current_subcmd = DATATYPE elsif ( @current_cmd == FORMAT && equal?( token, CharactersBlock::MISSING ) ) @current_subcmd = CharactersBlock::MISSING elsif ( @current_cmd == FORMAT && equal?( token, CharactersBlock::GAP ) ) @current_subcmd = CharactersBlock::GAP elsif ( @current_cmd == FORMAT && equal?( token, CharactersBlock::MATCHCHAR ) ) @current_subcmd = CharactersBlock::MATCHCHAR elsif ( cmds_equal_to?( DIMENSIONS, NTAX ) ) @current_block.set_number_of_taxa( token ) elsif ( cmds_equal_to?( DIMENSIONS, NCHAR ) ) @current_block.set_number_of_characters( token ) elsif ( cmds_equal_to?( FORMAT, DATATYPE ) ) @current_block.set_datatype( token ) elsif ( cmds_equal_to?( FORMAT, CharactersBlock::MISSING ) ) @current_block.set_missing( token ) elsif ( cmds_equal_to?( FORMAT, CharactersBlock::GAP ) ) @current_block.set_gap_character( token ) elsif ( cmds_equal_to?( FORMAT, CharactersBlock::MATCHCHAR ) ) @current_block.set_match_character( token ) elsif ( cmds_equal_to?( MATRIX, nil ) ) @current_block.set_matrix( make_matrix( token, ary, @current_block.get_number_of_characters, true ) ) end end
This processes the tokens (between Begin Taxa; and End;) for a data block. Example of a currently parseable data block: Begin Data; Dimensions ntax=5 nchar=14; Format Datatype=RNA gap=# MISSING=x MatchChar=^; TaxLabels ciona cow [comment] ape 'purple urchin' “green lizard”; Matrix taxon_1 A- CCGTCGA-GTTA taxon_2 T- CCG-CGA-GATA taxon_3 A- C-GTCGA-GATA taxon_4 A- CCTCGA–GTTA taxon_5 T- CGGTCGT-CTTA; End;
Arguments:
-
(required) token: String
-
(required) ary: Array
# File lib/bio/db/nexus.rb, line 591 def process_token_for_data_block( token, ary ) if ( equal?( token, DIMENSIONS ) ) @current_cmd = DIMENSIONS @current_subcmd = nil elsif ( equal?( token, FORMAT ) ) @current_cmd = FORMAT @current_subcmd = nil elsif ( equal?( token, TAXLABELS ) ) @current_cmd = TAXLABELS @current_subcmd = nil elsif ( equal?( token, MATRIX ) ) @current_cmd = MATRIX @current_subcmd = nil elsif ( @current_cmd == DIMENSIONS && equal?( token, NTAX ) ) @current_subcmd = NTAX elsif ( @current_cmd == DIMENSIONS && equal?( token, NCHAR ) ) @current_subcmd = NCHAR elsif ( @current_cmd == FORMAT && equal?( token, DATATYPE ) ) @current_subcmd = DATATYPE elsif ( @current_cmd == FORMAT && equal?( token, CharactersBlock::MISSING ) ) @current_subcmd = CharactersBlock::MISSING elsif ( @current_cmd == FORMAT && equal?( token, CharactersBlock::GAP ) ) @current_subcmd = CharactersBlock::GAP elsif ( @current_cmd == FORMAT && equal?( token, CharactersBlock::MATCHCHAR ) ) @current_subcmd = CharactersBlock::MATCHCHAR elsif ( cmds_equal_to?( DIMENSIONS, NTAX ) ) @current_block.set_number_of_taxa( token ) elsif ( cmds_equal_to?( DIMENSIONS, NCHAR ) ) @current_block.set_number_of_characters( token ) elsif ( cmds_equal_to?( FORMAT, DATATYPE ) ) @current_block.set_datatype( token ) elsif ( cmds_equal_to?( FORMAT, CharactersBlock::MISSING ) ) @current_block.set_missing( token ) elsif ( cmds_equal_to?( FORMAT, CharactersBlock::GAP ) ) @current_block.set_gap_character( token ) elsif ( cmds_equal_to?( FORMAT, CharactersBlock::MATCHCHAR ) ) @current_block.set_match_character( token ) elsif ( cmds_equal_to?( TAXLABELS, nil ) ) @current_block.add_taxon( token ) elsif ( cmds_equal_to?( MATRIX, nil ) ) @current_block.set_matrix( make_matrix( token, ary, @current_block.get_number_of_characters, true ) ) end end
This processes the tokens (between Begin Taxa; and End;) for a character block. Example of a currently parseable character block: Begin Distances;
Dimensions nchar=20 ntax=5; Format Triangle=Upper; Matrix taxon_1 0.0 1.0 2.0 4.0 7.0 taxon_2 1.0 0.0 3.0 5.0 8.0 taxon_3 3.0 4.0 0.0 6.0 9.0 taxon_4 7.0 3.0 1.0 0.0 9.5 taxon_5 1.2 1.3 1.4 1.5 0.0;
End;
Arguments:
-
(required) token: String
-
(required) ary: Array
# File lib/bio/db/nexus.rb, line 542 def process_token_for_distances_block( token, ary ) if ( equal?( token, DIMENSIONS ) ) @current_cmd = DIMENSIONS @current_subcmd = nil elsif ( equal?( token, FORMAT ) ) @current_cmd = FORMAT @current_subcmd = nil elsif ( equal?( token, MATRIX ) ) @current_cmd = MATRIX @current_subcmd = nil elsif ( @current_cmd == DIMENSIONS && equal?( token, NTAX ) ) @current_subcmd = NTAX elsif ( @current_cmd == DIMENSIONS && equal?( token, NCHAR ) ) @current_subcmd = NCHAR elsif ( @current_cmd == FORMAT && equal?( token, DATATYPE ) ) @current_subcmd = DATATYPE elsif ( @current_cmd == FORMAT && equal?( token, DistancesBlock::TRIANGLE ) ) @current_subcmd = DistancesBlock::TRIANGLE elsif ( cmds_equal_to?( DIMENSIONS, NTAX ) ) @current_block.set_number_of_taxa( token ) elsif ( cmds_equal_to?( DIMENSIONS, NCHAR ) ) @current_block.set_number_of_characters( token ) elsif ( cmds_equal_to?( FORMAT, DistancesBlock::TRIANGLE ) ) @current_block.set_triangle( token ) elsif ( cmds_equal_to?( MATRIX, nil ) ) @current_block.set_matrix( make_matrix( token, ary, @current_block.get_number_of_taxa, false ) ) end end
This processes the tokens (between Begin Taxa; and End;) for a block for which a specific parser is not available. Example of a currently parseable generic block: Begin Taxa;
token1 token2 token3 ...
End;
Arguments:
-
(required) token: String
# File lib/bio/db/nexus.rb, line 709 def process_token_for_generic_block( token ) @current_block.add_token( token ) end
This processes the tokens (between Begin Taxa; and End;) for a taxa block Example of a currently parseable taxa block: Begin Taxa;
Dimensions NTax=4; TaxLabels fish [comment] 'african frog' "rat snake" 'red mouse';
End;
Arguments:
-
(required) token: String
# File lib/bio/db/nexus.rb, line 422 def process_token_for_taxa_block( token ) if ( equal?( token, DIMENSIONS ) ) @current_cmd = DIMENSIONS @current_subcmd = nil elsif ( equal?( token, TAXLABELS ) ) @current_cmd = TAXLABELS @current_subcmd = nil elsif ( @current_cmd == DIMENSIONS && equal?( token, NTAX ) ) @current_subcmd = NTAX elsif ( cmds_equal_to?( DIMENSIONS, NTAX ) ) @current_block.set_number_of_taxa( token ) elsif ( cmds_equal_to?( TAXLABELS, nil ) ) @current_block.add_taxon( token ) end end
This processes the tokens (between Begin Trees; and End;) for a trees block Example of a currently parseable taxa block: Begin Trees; Tree best=(fish,(frog,(snake, mouse))); Tree other=(snake,(frog,( fish, mouse))); End;
Arguments:
-
(required) token: String
-
(required) ary: Array
# File lib/bio/db/nexus.rb, line 509 def process_token_for_trees_block( token, ary ) if ( equal?( token, TreesBlock::TREE ) ) @current_cmd = TreesBlock::TREE @current_subcmd = nil elsif ( cmds_equal_to?( TreesBlock::TREE, nil ) ) @current_block.add_tree_name( token ) tree_string = ary.shift while ( tree_string.index( ";" ) == nil ) tree_string << ary.shift end @current_block.add_tree( tree_string ) @current_cmd = nil end end
Resets @current_cmd and @current_subcmd to nil.
# File lib/bio/db/nexus.rb, line 385 def reset_command_state() @current_cmd = nil @current_subcmd = nil end