module Ai4r::Data::Statistics

This module provides some basic statistics functions to operate on data set attributes.

Public Class Methods

max(data_set, attribute) click to toggle source

Get the maximum value of an attribute in the data set

# File lib/ai4r/data/statistics.rb, line 62
def self.max(data_set, attribute)
  index = data_set.get_index(attribute)
  item = data_set.data_items.max {|x,y| x[index] <=> y[index]}
  return (item) ? item[index] : (-1.0/0)
end
mean(data_set, attribute) click to toggle source

Get the sample mean

# File lib/ai4r/data/statistics.rb, line 20
def self.mean(data_set, attribute)
  index = data_set.get_index(attribute)
  sum = 0.0
  data_set.data_items.each { |item| sum += item[index] }
  return sum / data_set.data_items.length
end
min(data_set, attribute) click to toggle source

Get the minimum value of an attribute in the data set

# File lib/ai4r/data/statistics.rb, line 69
def self.min(data_set, attribute)
  index = data_set.get_index(attribute)
  item = data_set.data_items.min {|x,y| x[index] <=> y[index]}
  return (item) ? item[index] : (1.0/0)
end
mode(data_set, attribute) click to toggle source

Get the sample mode.

# File lib/ai4r/data/statistics.rb, line 45
def self.mode(data_set, attribute)
  index = data_set.get_index(attribute)
  count = Hash.new {0}
  max_count = 0
  mode = nil
  data_set.data_items.each do |data_item| 
    attr_value = data_item[index]
    attr_count = (count[attr_value] += 1)
    if attr_count > max_count
      mode = attr_value
      max_count = attr_count
    end
  end
  return mode
end
standard_deviation(data_set, attribute, variance = nil) click to toggle source

Get the standard deviation. You can provide the variance if you have it already, to speed up things.

# File lib/ai4r/data/statistics.rb, line 39
def self.standard_deviation(data_set, attribute, variance = nil)
  variance ||= variance(data_set, attribute)
  Math.sqrt(variance)
end
variance(data_set, attribute, mean = nil) click to toggle source

Get the variance. You can provide the mean if you have it already, to speed up things.

# File lib/ai4r/data/statistics.rb, line 29
def self.variance(data_set, attribute, mean = nil)
  index = data_set.get_index(attribute)
  mean = mean(data_set, attribute)
  sum = 0.0
  data_set.data_items.each { |item| sum += (item[index]-mean)**2 }
  return sum / (data_set.data_items.length-1)
end