Object
The two most trivial problems with a set of rules is that they match either less or more instances than we would like them to. Constraints are a way to remedy the second problem: they serve as a tool to filter out some result instances based on rules. A typical example:
ensure_presence_of_ancestor_pattern consider this model:
<book> <author>...</author> <title>...</title> </book>
If I attach the ensure_presence_of_ancestor_pattern to the pattern 'book' with values 'author' and 'title', only those books will be matched which have an author and a title (i.e.the child patterns author and title must extract something). This is a way to say 'a book MUST have an author and a title'.
If this type of constraint is added to a pattern, the HTML node extracted by the pattern must NOT contain a HTML ancestor node called 'node_name' with the attribute set 'attributes'.
"attributes" is an array of hashes, for example
in the case that more values have to be checked with the same key (e.g. 'class' => 'small' and ' class' => 'wide' it has to be written as [{'class' => ['small','wide']}]
"attributes" can be empty - in this case just the 'node_name' is checked
# File lib/scrubyt/core/scraping/constraint.rb, line 89 def self.add_ensure_absence_of_ancestor_node(node_name, attributes) Constraint.new([node_name, attributes], CONSTRAINT_TYPE_ENSURE_ABSENCE_OF_ANCESTOR_NODE) end
If this type of constraint is added to a pattern, the HTML node it targets must NOT have an attribute named "attribute_name" with the value "attribute_value"
# File lib/scrubyt/core/scraping/constraint.rb, line 64 def self.add_ensure_absence_of_attribute(attribute_hash) Constraint.new(attribute_hash, CONSTRAINT_TYPE_ENSURE_ABSENCE_OF_ATTRIBUTE) end
If this type of constraint is added to a pattern, the HTML node extracted by the pattern must NOT contain a HTML ancestor node called 'node_name' with the attribute set 'attributes'.
"attributes" is an array of hashes, for example
in the case that more values have to be checked with the same key (e.g. 'class' => 'small' and ' class' => 'wide' it has to be written as [{'class' => ['small','wide']}]
"attributes" can be empty - in this case just the 'node_name' is checked
# File lib/scrubyt/core/scraping/constraint.rb, line 105 def self.add_ensure_presence_of_ancestor_node(node_name, attributes) Constraint.new([node_name, attributes], CONSTRAINT_TYPE_ENSURE_PRESENCE_OF_ANCESTOR_NODE) end
If this type of constraint is added to a pattern, the HTML node it targets must have an attribute named "attribute_name" with the value "attribute_value"
# File lib/scrubyt/core/scraping/constraint.rb, line 73 def self.add_ensure_presence_of_attribute(attribute_hash) Constraint.new(attribute_hash, CONSTRAINT_TYPE_ENSURE_PRESENCE_OF_ATTRIBUTE) end
If this type of constraint is added to a pattern, it must have an ancestor pattern (child pattern, or child pattern of a child pattern, etc.) denoted by "ancestor" 'Has an ancestor pattern' means that the ancestor pattern actually extracts something (just by looking at the wrapper model, the ancestor pattern is always present) Note that from this type of constraint there is no 'ensure_absence' version, since I could not think about an use case for that
# File lib/scrubyt/core/scraping/constraint.rb, line 56 def self.add_ensure_presence_of_pattern(ancestor) Constraint.new(ancestor, CONSTRAINT_TYPE_ENSURE_PRESENCE_OF_PATTERN) end
Evaluate the constraint; if this function returns true, it means that the constraint passed, i.e. its filter will be added to the exctracted content of the pattern
# File lib/scrubyt/core/scraping/constraint.rb, line 113 def check(result) case @type #checked after evaluation, so here always return true when CONSTRAINT_TYPE_ENSURE_PRESENCE_OF_PATTERN return true when CONSTRAINT_TYPE_ENSURE_PRESENCE_OF_ATTRIBUTE attribute_present(result) when CONSTRAINT_TYPE_ENSURE_ABSENCE_OF_ATTRIBUTE !attribute_present(result) when CONSTRAINT_TYPE_ENSURE_PRESENCE_OF_ANCESTOR_NODE ancestor_node_present(result) when CONSTRAINT_TYPE_ENSURE_ABSENCE_OF_ANCESTOR_NODE !ancestor_node_present(result) end end
Generated with the Darkfish Rdoc Generator 2.