class Bio::Tree
This is the class for phylogenetic tree. It stores a phylogenetic tree.
Internally, it is based on Bio::Pathway class. However, users cannot handle Bio::Pathway object directly.
This is alpha version. Incompatible changes may be made frequently.
Constants
- DEFAULT_OPTIONS
default options
Attributes
tree options; mainly used for tree output
root node of this tree (even if unrooted tree, it is used by some methods)
Public Class Methods
Creates a new phylogenetic tree. When no arguments are given, it creates a new empty tree. When a Tree object is given, it copies the tree. Note that the new tree shares Node and Edge objects with the given tree.
# File lib/bio/tree.rb, line 258 def initialize(tree = nil) # creates an undirected adjacency list graph @pathway = Bio::Pathway.new([], true) @root = nil @options = {} _init_cache self.concat(tree) if tree end
Public Instance Methods
Adds a new edge to the tree. Returns the newly added edge. If the edge already exists, it is overwritten with new one.
# File lib/bio/tree.rb, line 380 def add_edge(source, target, edge = Edge.new) _clear_cache @pathway.append(Bio::Relation.new(source, target, edge)) edge end
Adds a node to the tree. Returns self. If the node already exists, it does nothing.
# File lib/bio/tree.rb, line 402 def add_node(node) _clear_cache @pathway.graph[node] ||= {} self end
Shows the adjacency matrix representation of the tree. It shows matrix only for given nodes. If nodes is nil or is ommitted, it acts the same as tree.adjacency_matrix(tree.nodes). If a block is given, for each edge, it yields source, target, and edge, and uses the returned value of the block. Without blocks, it uses edge. Returns a matrix object.
# File lib/bio/tree.rb, line 822 def adjacency_matrix(nodes = nil, default_value = nil, diagonal_value = nil) #:yields: source, target, edge nodes ||= self.nodes size = nodes.size hash = {} nodes.each_with_index { |x, i| hash[x] = i } # prepares an matrix matrix = Array.new(size, nil) matrix.collect! { |x| Array.new(size, default_value) } (0...size).each { |i| matrix[i][i] = diagonal_value } # fills the matrix from each edge self.each_edge do |source, target, edge| i_source = hash[source] i_target = hash[target] if i_source and i_target then val = block_given? ? (yield source, target, edge) : edge matrix[i_source][i_target] = val matrix[i_target][i_source] = val end end Matrix.rows(matrix, false) end
Returns an array of adjacent nodes of the given node.
# File lib/bio/tree.rb, line 331 def adjacent_nodes(node) h = @pathway.graph[node] h ? h.keys : [] end
Gets all ancestral nodes of the node. If root isn't
specified or root is nil
, @root is used. Returns an
array of Node
s. The result is unspecified for cyclic trees.
# File lib/bio/tree.rb, line 757 def ancestors(node, root = nil) root ||= @root (self.path(root, node) - [ node ]).reverse end
Gets the adjacent children nodes of the node. If root
isn't specified or root is nil
, @root is used.
Returns an array of Node
s. The result is unspecified for
cyclic trees.
# File lib/bio/tree.rb, line 701 def children(node, root = nil) root ||= @root c = self.adjacent_nodes(node) c.delete(self.parent(node, root)) c end
Clears all nodes and edges. Returns self. Note that options and root are also cleared.
# File lib/bio/tree.rb, line 289 def clear initialize self end
Removes all edges connected with the node. Returns self. If the node does not exist, raises IndexError.
# File lib/bio/tree.rb, line 417 def clear_node(node) unless self.include?(node) raise IndexError, 'the node does not exist' end _clear_cache @pathway.relations.delete_if do |rel| rel.node.include?(node) end @pathway.graph[node].each_key do |k| @pathway.graph[k].delete(node) end @pathway.graph[node].clear self end
Replaces each edge by each block's return value. Returns self.
# File lib/bio/tree.rb, line 527 def collect_edge! #:yields: source, target, edge _clear_cache @pathway.relations.each do |rel| newedge = yield rel.node[0], rel.node[1], rel.relation rel.edge = newedge @pathway.append(rel, false) end self end
Replaces each node by each block's return value. Returns self.
# File lib/bio/tree.rb, line 506 def collect_node! #:yields: node _clear_cache tr = {} self.each_node do |node| tr[node] = yield node end # replaces nodes in @pathway.relations @pathway.relations.each do |rel| rel.node.collect! { |node| tr[node] } end # re-generates @pathway from relations @pathway.to_list # adds orphan nodes tr.each_value do |newnode| @pathway.graph[newnode] ||= {} end self end
Concatenates the other tree. If the same edge exists, the edge in other is used. Returns self. The result is unspecified if other isn't a Tree object. Note that the Node and Edge objects in the other tree are shared in the concatinated tree.
# File lib/bio/tree.rb, line 595 def concat(other) #raise TypeError unless other.kind_of?(self.class) _clear_cache other.each_node do |node| self.add_node(node) end other.each_edge do |node1, node2, edge| self.add_edge(node1, node2, edge) end self end
Gets all descendent nodes of the node. If root isn't
specified or root is nil
, @root is used. Returns an
array of Node
s. The result is unspecified for cyclic trees.
# File lib/bio/tree.rb, line 712 def descendents(node, root = nil) root ||= @root distance, route = @pathway.breadth_first_search(root) d = distance[node] result = [] distance.each do |key, val| if val > d then x = key while x = route[x] if x == node then result << key break end break if distance[x] <= d end end end result end
Returns distance between node1 and node2. It would raise error if the edges didn't contain distance values. The result is unspecified for cyclic trees.
# File lib/bio/tree.rb, line 640 def distance(node1, node2) distance = 0 self.each_edge_in_path(node1, node2) do |source, target, edge| distance += get_edge_distance(edge) end distance end
Calculates distance matrix of given nodes. If nodes is nil, or is ommited, it acts the same as tree.distance_matrix(tree.leaves). Returns a matrix object. The result is unspecified for cyclic trees. Note 1: The diagonal values of the matrix are 0. Note 2: If the distance cannot be calculated, nil will be set.
# File lib/bio/tree.rb, line 793 def distance_matrix(nodes = nil) nodes ||= self.leaves matrix = [] nodes.each_index do |i| row = [] nodes.each_index do |j| if i == j then distance = 0 elsif r = matrix[j] and val = r[i] then distance = val else distance = (self.distance(nodes[i], nodes[j]) rescue nil) end row << distance end matrix << row end Matrix.rows(matrix, false) end
Iterates over each edges of this tree.
# File lib/bio/tree.rb, line 311 def each_edge #:yields: source, target, edge @pathway.relations.each do |rel| yield rel.node[0], rel.node[1], rel.relation end self end
Iterates over each edge from node1 to node2. The result is unspecified for cyclic trees.
# File lib/bio/tree.rb, line 626 def each_edge_in_path(node1, node2) path = self.path(node1, node2) source = path.shift path.each do |target| edge = self.get_edge(source, target) yield source, target, edge source = target end self end
Iterates over each node of this tree.
# File lib/bio/tree.rb, line 305 def each_node(&x) #:yields: node @pathway.graph.each_key(&x) self end
Iterates over each connected edges of the given node. Returns self.
The reason why the method name is “each_out_edge” is that it comes from the Boost Graph Library.
# File lib/bio/tree.rb, line 355 def each_out_edge(source) #:yields: source, target, edge h = @pathway.graph[source] h.each { |key, val| yield source, key, val } if h self end
Returns all edges an array of [ node0, node1, edge ]
# File lib/bio/tree.rb, line 319 def edges @pathway.relations.collect do |rel| [ rel.node[0], rel.node[1], rel.relation ] end end
Returns an edge from source to target. If source and target are not adjacent nodes, returns nil.
# File lib/bio/tree.rb, line 372 def get_edge(source, target) h = @pathway.graph[source] h ? h[target] : nil end
Gets distance value from the given edge. Returns float or any other numeric value or nil.
# File lib/bio/tree.rb, line 100 def get_edge_distance(edge) begin dist = edge.distance rescue NoMethodError dist = edge end dist end
Gets distance string from the given edge. Returns a string or nil.
# File lib/bio/tree.rb, line 111 def get_edge_distance_string(edge) begin dist = edge.distance_string rescue NoMethodError dist = (edge ? edge.to_s : nil) end dist end
Returns edge1 + edge2
# File lib/bio/tree.rb, line 121 def get_edge_merged(edge1, edge2) dist1 = get_edge_distance(edge1) dist2 = get_edge_distance(edge2) if dist1 and dist2 then Edge.new(dist1 + dist2) elsif dist1 then Edge.new(dist1) elsif dist2 then Edge.new(dist2) else Edge.new end end
# File lib/bio/tree.rb, line 237 def get_node_bootstrap(node) begin node.bootstrap rescue NoMethodError nil end end
# File lib/bio/tree.rb, line 245 def get_node_bootstrap_string(node) begin node.bootstrap_string rescue NoMethodError nil end end
Finds a node in the tree by given name and returns the node. If the node does not found, returns nil. If multiple nodes with the same name exist, the result would be one of those (unspecified).
# File lib/bio/tree.rb, line 390 def get_node_by_name(str) self.each_node do |node| if get_node_name(node) == str return node end end nil end
Gets node name
# File lib/bio/tree.rb, line 229 def get_node_name(node) begin node.name rescue NoMethodError node.to_s end end
If the node exists, returns true. Otherwise, returns false.
# File lib/bio/tree.rb, line 410 def include?(node) @pathway.graph[node] ? true : false end
Insert a new node between adjacent nodes node1 and node2. The old edge between node1 and node2 are changed to the edge between new_node and node2. The edge between node1 and new_node is newly created.
If new_distance is specified, the distance between node1 and new_node is
set to new_distance, and distance between new_node and node2 is set to
tree.get_edge(node1, node2).distance - new_distance
.
Returns self. If node1 and node2 are not adjacent, raises IndexError.
If new_node already exists in the tree, the tree would become circular. In addition, if the edge between new_node and node1 (or node2) already exists, it will be erased.
# File lib/bio/tree.rb, line 890 def insert_node(node1, node2, new_node, new_distance = nil) unless edge = self.get_edge(node1, node2) then raise IndexError, 'nodes not found or two nodes are not adjacent' end _clear_cache new_edge = Edge.new(new_distance) self.remove_edge(node1, node2) self.add_edge(node1, new_node, new_edge) if new_distance and old_distance = get_edge_distance(edge) then old_distance -= new_distance begin edge.distance = old_distance rescue NoMethodError edge = old_distance end end self.add_edge(new_node, node2, edge) self end
If node is nil, returns an array of all leaves (nodes connected
with one edge). Otherwise, gets all descendent leaf nodes of the
node. If root isn't specified or root is
nil
, @root is used. Returns an array of Node
s.
The result is unspecified for cyclic trees.
# File lib/bio/tree.rb, line 738 def leaves(node = nil, root = nil) unless node then nodes = [] self.each_node do |x| nodes << x if self.out_degree(x) == 1 end return nodes else root ||= @root self.descendents(node, root).find_all do |x| self.adjacent_nodes(x).size == 1 end end end
Gets the lowest common ancestor of the two nodes. If root
isn't specified or root is nil
, @root is used.
Returns a Node
object or nil. The result is unspecified for
cyclic trees.
# File lib/bio/tree.rb, line 766 def lowest_common_ancestor(node1, node2, root = nil) root ||= @root distance, route = @pathway.breadth_first_search(root) x = node1; r1 = [] begin; r1 << x; end while x = route[x] x = node2; r2 = [] begin; r2 << x; end while x = route[x] return (r1 & r2).first end
Returns all nodes as an array.
# File lib/bio/tree.rb, line 295 def nodes @pathway.graph.keys end
Returns number of edges in the tree.
# File lib/bio/tree.rb, line 326 def number_of_edges @pathway.relations.size end
Number of nodes.
# File lib/bio/tree.rb, line 300 def number_of_nodes @pathway.nodes end
Returns number of edges in the given node.
The reason why the method name is “out_degree” is that it comes from the Boost Graph Library.
# File lib/bio/tree.rb, line 365 def out_degree(source) h = @pathway.graph[source] h ? h.size : 0 end
Returns all connected edges with adjacent nodes. Returns an array of the array [ source, target, edge ].
The reason why the method name is “out_edges” is that it comes from the Boost Graph Library.
# File lib/bio/tree.rb, line 341 def out_edges(source) h = @pathway.graph[source] if h h.collect { |key, val| [ source, key, val ] } else [] end end
Returns formatted text (or something) of the tree Currently supported format is: :newick, :nhx
# File lib/bio/tree/output.rb, line 230 def output(format, *arg, &block) case format when :newick output_newick(*arg, &block) when :nhx output_nhx(*arg, &block) when :phylip_distance_matrix output_phylip_distance_matrix(*arg, &block) else raise 'Unknown format' end end
Returns a newick formatted string. If block is given, the order of the node is sorted (as the same manner as Enumerable#sort).
Available options:
:indent
-
indent string; set false to disable (default: ' ')
:bootstrap_style
-
:disabled
disables bootstrap representations.:traditional
for traditional style.:molphy
for Molphy style (default).
# File lib/bio/tree/output.rb, line 198 def output_newick(options = {}, &block) #:yields: node1, node2 root = @root root ||= self.nodes.first return '();' unless root __to_newick([], root, 0, :__to_newick_format_leaf, options, &block) + __to_newick_format_leaf(root, Edge.new, options) + ";\n" end
Returns a NHX (New Hampshire eXtended) formatted string. If block is given, the order of the node is sorted (as the same manner as Enumerable#sort).
Available options:
:indent
-
indent string; set false to disable (default: ' ')
# File lib/bio/tree/output.rb, line 218 def output_nhx(options = {}, &block) #:yields: node1, node2 root = @root root ||= self.nodes.first return '();' unless root __to_newick([], root, 0, :__to_newick_format_leaf_NHX, options, &block) + __to_newick_format_leaf_NHX(root, Edge.new, options) + ";\n" end
Generates phylip-style distance matrix as a string. if nodes is not given, all leaves in the tree are used. If the names of some of the given (or default) nodes are not defined or are empty, the names are automatically generated.
# File lib/bio/tree/output.rb, line 251 def output_phylip_distance_matrix(nodes = nil, options = {}) nodes = self.leaves unless nodes names = nodes.collect do |x| y = get_node_name(x) y = sprintf("%x", x.__id__.abs) if y.empty? y end m = self.distance_matrix(nodes) Bio::Phylip::DistanceMatrix.generate(m, names, options) end
Gets the parent node of the node. If root isn't
specified or root is nil
, @root is used. Returns an
Node
object or nil. The result is unspecified for cyclic
trees.
# File lib/bio/tree.rb, line 687 def parent(node, root = nil) root ||= @root raise IndexError, 'can not get parent for unrooted tree' unless root unless ret = _get_cached_parent(node, root) then ret = self.path(root, node)[-2] _cache_parent(node, ret, root) end ret end
Gets path from node1 to node2. Retruns an array of nodes, including node1 and node2. If node1 and/or node2 do not exist, IndexError is raised. If node1 and node2 are not connected, NoPathError is raised. The result is unspecified for cyclic trees.
# File lib/bio/tree.rb, line 612 def path(node1, node2) raise IndexError, 'node1 not found' unless @pathway.graph[node1] raise IndexError, 'node2 not found' unless @pathway.graph[node2] return [ node1 ] if node1 == node2 return [ node1, node2 ] if @pathway.graph[node1][node2] step, path = @pathway.bfs_shortest_path(node1, node2) unless path[0] == node1 and path[-1] == node2 then raise NoPathError, 'node1 and node2 are not connected' end path end
# Removes an edge between source and target. # Returns self. # If the edge does not exist, raises IndexError. +
# File lib/bio/tree.rb, line 465 def remove_edge(source, target) unless self.get_edge(source, target) then raise IndexError, 'edge not found' end _clear_cache fwd = [ source, target ] rev = [ target, source ] @pathway.relations.delete_if do |rel| rel.node == fwd or rel.node == rev end h = @pathway.graph[source] h.delete(target) if h h = @pathway.graph[target] h.delete(source) if h self end
Removes each edge if the block returns not nil. Returns self.
# File lib/bio/tree.rb, line 484 def remove_edge_if #:yields: source, target, edge _clear_cache removed_rel = [] @pathway.relations.delete_if do |rel| if yield rel.node[0], rel.node[1], rel.edge then removed_rel << rel true end end removed_rel.each do |rel| source = rel.node[0] target = rel.node[1] h = @pathway.graph[source] h.delete(target) if h h = @pathway.graph[target] h.delete(source) if h end self end
Removes the given node from the tree. All edges connected with the node are also removed. Returns self. If the node does not exist, raises IndexError.
# File lib/bio/tree.rb, line 436 def remove_node(node) #_clear_cache #done in clear_node(node) self.clear_node(node) @pathway.graph.delete(node) self end
Removes each node if the block returns not nil. All edges connected with the removed nodes are also removed. Returns self.
# File lib/bio/tree.rb, line 446 def remove_node_if #_clear_cache #done in clear_node(node) all = self.nodes all.each do |node| if yield node then self.clear_node(node) @pathway.graph.delete(node) end end self end
Removes all nodes that are not branches nor leaves. That is, removes nodes connected with exactly two edges. For each removed node, two adjacent edges are merged and a new edge are created. Returns removed nodes. Note that orphan nodes are still kept unchanged.
# File lib/bio/tree.rb, line 852 def remove_nonsense_nodes _clear_cache hash = {} self.each_node do |node| hash[node] = true if @pathway.graph[node].size == 2 end hash.each_key do |node| adjs = @pathway.graph[node].keys edges = @pathway.graph[node].values new_edge = get_edge_merged(edges[0], edges[1]) @pathway.graph[adjs[0]].delete(node) @pathway.graph[adjs[1]].delete(node) @pathway.graph.delete(node) @pathway.append(Bio::Relation.new(adjs[0], adjs[1], new_edge)) end #@pathway.to_relations @pathway.relations.reject! do |rel| hash[rel.node[0]] or hash[rel.node[1]] end return hash.keys end
Gets the sub-tree consisted of given nodes. nodes must be an array of nodes. Nodes that do not exist in the original tree are ignored. Returns a Tree object. Note that the sub-tree shares Node and Edge objects with the original tree.
# File lib/bio/tree.rb, line 543 def subtree(nodes) nodes = nodes.find_all do |x| @pathway.graph[x] end return self.class.new if nodes.empty? # creates subtree new_tree = self.class.new nodes.each do |x| new_tree.add_node(x) end self.each_edge do |node1, node2, edge| if new_tree.include?(node1) and new_tree.include?(node2) then new_tree.add_edge(node1, node2, edge) end end return new_tree end
Gets the sub-tree consisted of given nodes and all internal nodes connected between given nodes. nodes must be an array of nodes. Nodes that do not exist in the original tree are ignored. Returns a Tree object. The result is unspecified for cyclic trees. Note that the sub-tree shares Node and Edge objects with the original tree.
# File lib/bio/tree.rb, line 569 def subtree_with_all_paths(nodes) hash = {} nodes.each { |x| hash[x] = true } nodes.each_index do |i| node1 = nodes[i] (0...i).each do |j| node2 = nodes[j] unless node1 == node2 then begin path = self.path(node1, node2) rescue IndexError, NoPathError path = [] end path.each { |x| hash[x] = true } end end end self.subtree(hash.keys) end
Returns total distance of all edges. It would raise error if some edges didn't contain distance values.
# File lib/bio/tree.rb, line 778 def total_distance distance = 0 self.each_edge do |source, target, edge| distance += get_edge_distance(edge) end distance end
Private Instance Methods
# File lib/bio/tree/output.rb, line 30 def __get_option(key, options) if (r = options[key]) != nil then r elsif @options && (r = @options[key]) != nil then r else DEFAULT_OPTIONS[key] end end
# File lib/bio/tree/output.rb, line 146 def __to_newick(parents, source, depth, format_leaf, options, &block) result = [] if indent_string = __get_option(:indent, options) then indent0 = indent_string * depth indent = indent_string * (depth + 1) newline = "\n" else indent0 = indent = newline = '' end out_edges = self.out_edges(source) if block_given? then out_edges.sort! { |edge1, edge2| yield(edge1[1], edge2[1]) } else out_edges.sort! do |edge1, edge2| o1 = edge1[1].order_number o2 = edge2[1].order_number if o1 and o2 then o1 <=> o2 else edge1[1].name.to_s <=> edge2[1].name.to_s end end end out_edges.each do |src, tgt, edge| if parents.include?(tgt) then ;; elsif self.out_degree(tgt) == 1 then result << indent + __send__(format_leaf, tgt, edge, options) else result << __to_newick([ src ].concat(parents), tgt, depth + 1, format_leaf, options) + __send__(format_leaf, tgt, edge, options) end end indent0 + "(" + newline + result.join(',' + newline) + (result.size > 0 ? newline : '') + indent0 + ')' end
formats Newick label (unquoted_label or quoted_label)
# File lib/bio/tree/output.rb, line 43 def __to_newick_format_label(str, options) if __get_option(:parser, options) == :naive then return str.to_s end str = str.to_s if /([\(\)\,\:\[\]\_\\x00-\x1f\x7f])/ =~ str then # quoted_label return "\'" + str.gsub(/\/, "\'\'") + "\'" end # unquoted_label return str.gsub(/ /, '_') end
formats leaf
# File lib/bio/tree/output.rb, line 58 def __to_newick_format_leaf(node, edge, options) label = __to_newick_format_label(get_node_name(node), options) dist = get_edge_distance_string(edge) bs = get_node_bootstrap_string(node) if __get_option(:branch_length_style, options) == :disabled dist = nil end case __get_option(:bootstrap_style, options) when :disabled label + (dist ? ":#{dist}" : '') when :molphy label + (dist ? ":#{dist}" : '') + (bs ? "[#{bs}]" : '') when :traditional label + (bs ? bs : '') + (dist ? ":#{dist}" : '') else # default: same as molphy style label + (dist ? ":#{dist}" : '') + (bs ? "[#{bs}]" : '') end end
formats leaf for NHX
# File lib/bio/tree/output.rb, line 85 def __to_newick_format_leaf_NHX(node, edge, options) label = __to_newick_format_label(get_node_name(node), options) dist = get_edge_distance_string(edge) bs = get_node_bootstrap_string(node) if __get_option(:branch_length_style, options) == :disabled dist = nil end nhx = {} # bootstrap nhx[:B] = bs if bs and !(bs.empty?) # EC number nhx[:E] = node.ec_number if node.instance_eval { defined?(@ec_number) && self.ec_number } # scientific name nhx[:S] = node.scientific_name if node.instance_eval { defined?(@scientific_name) && self.scientific_name } # taxonomy id nhx[:T] = node.taxonomy_id if node.instance_eval { defined?(@taxonomy_id) && self.taxonomy_id } # :D (gene duplication or speciation) if node.instance_eval { defined?(@events) && !(self.events.empty?) } then if node.events.include?(:gene_duplication) nhx[:D] = 'Y' elsif node.events.include?(:speciation) nhx[:D] = 'N' end end # log likelihood nhx[:L] = edge.log_likelihood if edge.instance_eval { defined?(@log_likelihood) && self.log_likelihood } # width nhx[:W] = edge.width if edge.instance_eval { defined?(@width) && self.width } # merges other parameters flag = node.instance_eval { defined? @nhx_parameters } nhx.merge!(node.nhx_parameters) if flag flag = edge.instance_eval { defined? @nhx_parameters } nhx.merge!(edge.nhx_parameters) if flag nhx_string = nhx.keys.sort{ |a,b| a.to_s <=> b.to_s }.collect do |key| "#{key.to_s}=#{nhx[key].to_s}" end.join(':') nhx_string = "[&&NHX:" + nhx_string + "]" unless nhx_string.empty? label + (dist ? ":#{dist}" : '') + nhx_string end
(private) set parent cache
# File lib/bio/tree.rb, line 673 def _cache_parent(node, parent, root) return unless parent cache = @cache_parent[root] cache[node] = parent self.adjacent_nodes(node).each do |n| cache[n] ||= node if n != parent end end
(private) clear internal cache
# File lib/bio/tree.rb, line 274 def _clear_cache @cache_parent.clear end
(private) get parent only by using cache
# File lib/bio/tree.rb, line 649 def _get_cached_parent(node, root) @cache_parent[root] ||= Hash.new cache = @cache_parent[root] if node == root then unless cache.has_key?(root) then self.adjacent_nodes(root).each do |n| cache[n] ||= root if n != root end cache[root] = nil end parent = nil else unless parent = cache[node] then parent = self.adjacent_nodes(node).find { |n| (m = cache[n]) && (m != node) } _cache_parent(node, parent, root) if parent end parent end end
(private) clear internal cache
# File lib/bio/tree.rb, line 268 def _init_cache @cache_parent = {} end