class EmailReplyParser::Email

An Email instance represents a parsed body String.

Constants

EMPTY
SIGNATURE
SIG_REGEX

Attributes

fragments[R]

Emails have an Array of Fragments.

Public Class Methods

new() click to toggle source
# File lib/email_reply_parser.rb, line 60
def initialize
  @fragments = []
end

Public Instance Methods

read(text) click to toggle source

Splits the given text into a list of Fragments. This is roughly done by reversing the text and parsing from the bottom to the top. This way we can check for 'On <date>, <author> wrote:' lines above quoted blocks.

text - A String email body.

Returns this same Email instance.

# File lib/email_reply_parser.rb, line 78
def read(text)
  # in 1.9 we want to operate on the raw bytes
  text = text.dup.force_encoding('binary') if text.respond_to?(:force_encoding)

  # Normalize line endings.
  text.gsub!("\r\n", "\n")

  # Check for multi-line reply headers. Some clients break up
  # the "On DATE, NAME <EMAIL> wrote:" line into multiple lines.
  if text =~ /^(?!On.*On\s.+?wrote:)(On\s(.+?)wrote:)$/nm
    # Remove all new lines from the reply header.
    text.gsub! $1, $1.gsub("\n", " ")
  end

  # Some users may reply directly above a line of underscores.
  # In order to ensure that these fragments are split correctly,
  # make sure that all lines of underscores are preceded by
  # at least two newline characters.
  text.gsub!(/([^\n])(?=\n_{7}_+)$/m, "\\1\n")

  # The text is reversed initially due to the way we check for hidden
  # fragments.
  text = text.reverse

  # This determines if any 'visible' Fragment has been found.  Once any
  # visible Fragment is found, stop looking for hidden ones.
  @found_visible = false

  # This instance variable points to the current Fragment.  If the matched
  # line fits, it should be added to this Fragment.  Otherwise, finish it
  # and start a new Fragment.
  @fragment = nil

  # Use the StringScanner to pull out each line of the email content.
  @scanner = StringScanner.new(text)
  while line = @scanner.scan_until(/\n/n)
    scan_line(line)
  end

  # Be sure to parse the last line of the email.
  if (last_line = @scanner.rest.to_s).size > 0
    scan_line(last_line)
  end

  # Finish up the final fragment.  Finishing a fragment will detect any
  # attributes (hidden, signature, reply), and join each line into a
  # string.
  finish_fragment

  @scanner = @fragment = nil

  # Now that parsing is done, reverse the order.
  @fragments.reverse!
  self
end
visible_text() click to toggle source

Public: Gets the combined text of the visible fragments of the email body.

Returns a String.

# File lib/email_reply_parser.rb, line 67
def visible_text
  fragments.select{|f| !f.hidden?}.map{|f| f.to_s}.join("\n").rstrip
end

Private Instance Methods

finish_fragment() click to toggle source

Builds the fragment string and reverses it, after all lines have been added. It also checks to see if this Fragment is hidden. The hidden Fragment check reads from the bottom to the top.

Any quoted Fragments or signature Fragments are marked hidden if they are below any visible Fragments. Visible Fragments are expected to contain original content by the author. If they are below a quoted Fragment, then the Fragment should be visible to give context to the reply.

some original text (visible)

> do you have any two's? (quoted, visible)

Go fish! (visible)

> --
> Player 1 (quoted, hidden)

--
Player 2 (signature, hidden)
# File lib/email_reply_parser.rb, line 217
def finish_fragment
  if @fragment
    @fragment.finish
    if !@found_visible
      if @fragment.quoted? || @fragment.signature? ||
          @fragment.to_s.strip == EMPTY
        @fragment.hidden = true
      else
        @found_visible = true
      end
    end
    @fragments << @fragment
  end
  @fragment = nil
end
quote_header?(line) click to toggle source

Detects if a given line is a header above a quoted area. It is only checked for lines preceding quoted regions.

line - A String line of text from the email.

Returns true if the line is a valid header, or false.

# File lib/email_reply_parser.rb, line 191
def quote_header?(line)
  line =~ /^:etorw.*nO$/n
end
scan_line(line) click to toggle source

Scans the given line of text and figures out which fragment it belongs to.

line - A String line of text from the email.

Returns nothing.

# File lib/email_reply_parser.rb, line 153
def scan_line(line)
  line.chomp!("\n")
  line.lstrip! unless SIG_REGEX.match(line)

  # We're looking for leading `>`'s to see if this line is part of a
  # quoted Fragment.
  is_quoted = !!(line =~ /(>+)$/n)

  # Mark the current Fragment as a signature if the current line is empty
  # and the Fragment starts with a common signature indicator.
  if @fragment && line == EMPTY
    if SIG_REGEX.match @fragment.lines.last
      @fragment.signature = true
      finish_fragment
    end
  end

  # If the line matches the current fragment, add it.  Note that a common
  # reply header also counts as part of the quoted Fragment, even though
  # it doesn't start with `>`.
  if @fragment &&
      ((@fragment.quoted? == is_quoted) ||
       (@fragment.quoted? && (quote_header?(line) || line == EMPTY)))
    @fragment.lines << line

  # Otherwise, finish the fragment and start a new one.
  else
    finish_fragment
    @fragment = Fragment.new(is_quoted, line)
  end
end