Classifier
This is an implementation of the ID3 algorithm (Quinlan) Given a set of preclassified examples, it builds a top-down induction of decision tree, biased by the information gain and entropy measure.
DATA_LABELS = [ 'city', 'age_range', 'gender', 'marketing_target' ] DATA_ITEMS = [ ['New York', '<30', 'M', 'Y'], ['Chicago', '<30', 'M', 'Y'], ['Chicago', '<30', 'F', 'Y'], ['New York', '<30', 'M', 'Y'], ['New York', '<30', 'M', 'Y'], ['Chicago', '[30-50)', 'M', 'Y'], ['New York', '[30-50)', 'F', 'N'], ['Chicago', '[30-50)', 'F', 'Y'], ['New York', '[30-50)', 'F', 'N'], ['Chicago', '[50-80]', 'M', 'N'], ['New York', '[50-80]', 'F', 'N'], ['New York', '[50-80]', 'M', 'N'], ['Chicago', '[50-80]', 'M', 'N'], ['New York', '[50-80]', 'F', 'N'], ['Chicago', '>80', 'F', 'Y'] ] data_set = DataSet.new(:data_items=>DATA_SET, :data_labels=>DATA_LABELS) id3 = Ai4r::Classifiers::ID3.new.build(data_set) id3.get_rules # => if age_range=='<30' then marketing_target='Y' elsif age_range=='[30-50)' and city=='Chicago' then marketing_target='Y' elsif age_range=='[30-50)' and city=='New York' then marketing_target='N' elsif age_range=='[50-80]' then marketing_target='N' elsif age_range=='>80' then marketing_target='Y' else raise 'There was not enough information during training to do a proper induction for this data element' end id3.eval(['New York', '<30', 'M']) # => 'Y'
In the real life you will use lot more data training examples, with more attributes. Consider moving your data to an external CSV (comma separate values) file.
data_file = "#{File.dirname(__FILE__)}/data_set.csv" data_set = DataSet.load_csv_with_labels data_file id3 = Ai4r::Classifiers::ID3.new.build(data_set)
id3 = Ai4r::Classifiers::ID3.new.build(data_set) age_range = '<30' marketing_target = nil eval id3.get_rules puts marketing_target # => 'Y'
Author |
Sergio Fierens |
License |
MPL 1.1 |
Url |
Create a new ID3 classifier. You must provide a DataSet instance as parameter. The last attribute of each item is considered as the item class.
# File lib/ai4r/classifiers/id3.rb, line 99 def build(data_set) data_set.check_not_empty @data_set = data_set preprocess_data(@data_set.data_items) return self end
You can evaluate new data, predicting its category. e.g.
id3.eval(['New York', '<30', 'F']) # => 'Y'
# File lib/ai4r/classifiers/id3.rb, line 109 def eval(data) @tree.value(data) if @tree end
This method returns the generated rules in ruby code. e.g.
id3.get_rules # => if age_range=='<30' then marketing_target='Y' elsif age_range=='[30-50)' and city=='Chicago' then marketing_target='Y' elsif age_range=='[30-50)' and city=='New York' then marketing_target='N' elsif age_range=='[50-80]' then marketing_target='N' elsif age_range=='>80' then marketing_target='Y' else raise 'There was not enough information during training to do a proper induction for this data element' end
It is a nice way to inspect induction results, and also to execute them:
age_range = '<30' marketing_target = nil eval id3.get_rules puts marketing_target # => 'Y'
# File lib/ai4r/classifiers/id3.rb, line 130 def get_rules #return "Empty ID3 tree" if !@tree rules = @tree.get_rules rules = rules.collect do |rule| "#{rule[0..-2].join(' and ')} then #{rule.last}" end return "if #{rules.join("\nelsif ")}\nelse raise 'There was not enough information during training to do a proper induction for this data element' end" end
Generated with the Darkfish Rdoc Generator 2.