Before the document is passed to Hpricot for parsing, we may need to do
different stuff with it which are clumsy/not appropriate/impossible to do
once the document is loaded.
Public Class Methods
br_to_newline(doc)click to toggle source
Replace <br/> tags with newlines
# File lib/scrubyt/core/scraping/pre_filter_document.rb, line 9defself.br_to_newline(doc)
doc.gsub(/<br[ \/]*>/, "\r\n")
doc = doc.tr("\2240"," ")
end