Parent

Included Modules

PDF::Reader::ObjectHash

Provides low level access to the objects in a PDF file via a hash-like object.

A PDF file can be viewed as a large hash map. It is a series of objects stored at precise byte offsets, and a table that maps object IDs to byte offsets. Given an object ID, looking up an object is an O(1) operation.

Each PDF object can be mapped to a ruby object, so by passing an object ID to the [] method, a ruby representation of that object will be retrieved.

The class behaves much like a standard Ruby hash, including the use of the Enumerable mixin. The key difference is no []= method - the hash is read only.

Basic Usage

h = PDF::Reader::ObjectHash.new("somefile.pdf")
h[1]
=> 3469

h[PDF::Reader::Reference.new(1,0)]
=> 3469

Attributes

default[RW]
pdf_version[R]
sec_handler[R]
trailer[R]

Public Class Methods

new(input, opts = {}) click to toggle source

Creates a new ObjectHash object. Input can be a string with a valid filename or an IO-like object.

Valid options:

:password - the user password to decrypt the source PDF
# File lib/pdf/reader/object_hash.rb, line 41
def initialize(input, opts = {})
  @io          = extract_io_from(input)
  @xref        = PDF::Reader::XRef.new(@io)
  @pdf_version = read_version
  @trailer     = @xref.trailer
  @cache       = opts[:cache] || PDF::Reader::ObjectCache.new
  @sec_handler = build_security_handler(opts)
end

Public Instance Methods

[](key) click to toggle source

Access an object from the PDF. key can be an int or a PDF::Reader::Reference object.

If an int is used, the object with that ID and a generation number of 0 will be returned.

If a PDF::Reader::Reference object is used the exact ID and generation number can be specified.

# File lib/pdf/reader/object_hash.rb, line 71
def [](key)
  return default if key.to_i <= 0

  unless key.is_a?(PDF::Reader::Reference)
    key = PDF::Reader::Reference.new(key.to_i, 0)
  end

  if @cache.has_key?(key)
    @cache[key]
  elsif xref[key].is_a?(Fixnum)
    buf = new_buffer(xref[key])
    @cache[key] = decrypt(key, Parser.new(buf, self).object(key.id, key.gen))
  elsif xref[key].is_a?(PDF::Reader::Reference)
    container_key = xref[key]
    object_streams[container_key] ||= PDF::Reader::ObjectStream.new(object(container_key))
    @cache[key] = object_streams[container_key][key.id]
  end
rescue InvalidObjectError
  return default
end
deref(key) click to toggle source
Alias for: object
deref!(key) click to toggle source

Recursively dereferences the object refered to be key. If key is not a PDF::Reader::Reference, the key is returned unchanged.

# File lib/pdf/reader/object_hash.rb, line 103
def deref!(key)
  case object = deref(key)
  when Hash
    {}.tap { |hash|
      object.each do |k, value|
        hash[k] = deref!(value)
      end
    }
  when PDF::Reader::Stream
    object.hash = deref!(object.hash)
    object
  when Array
    object.map { |value| deref!(value) }
  else
    object
  end
end
each(&block) click to toggle source

iterate over each key, value. Just like a ruby hash.

# File lib/pdf/reader/object_hash.rb, line 146
def each(&block)
  @xref.each do |ref|
    yield ref, self[ref]
  end
end
Also aliased as: each_pair
each_key(&block) click to toggle source

iterate over each key. Just like a ruby hash.

# File lib/pdf/reader/object_hash.rb, line 155
def each_key(&block)
  each do |id, obj|
    yield id
  end
end
each_pair(&block) click to toggle source
Alias for: each
each_value(&block) click to toggle source

iterate over each value. Just like a ruby hash.

# File lib/pdf/reader/object_hash.rb, line 163
def each_value(&block)
  each do |id, obj|
    yield obj
  end
end
empty?() click to toggle source

return true if there are no objects in this file

# File lib/pdf/reader/object_hash.rb, line 178
def empty?
  size == 0 ? true : false
end
encrypted?() click to toggle source
# File lib/pdf/reader/object_hash.rb, line 258
def encrypted?
  trailer.has_key?(:Encrypt)
end
fetch(key, local_default = nil) click to toggle source

Access an object from the PDF. key can be an int or a PDF::Reader::Reference object.

If an int is used, the object with that ID and a generation number of 0 will be returned.

If a PDF::Reader::Reference object is used the exact ID and generation number can be specified.

local_default is the object that will be returned if the requested key doesn’t exist.

# File lib/pdf/reader/object_hash.rb, line 133
def fetch(key, local_default = nil)
  obj = self[key]
  if obj
    return obj
  elsif local_default
    return local_default
  else
    raise IndexError, "#{key} is invalid" if key.to_i <= 0
  end
end
has_key?(check_key) click to toggle source

return true if the specified key exists in the file. key can be an int or a PDF::Reader::Reference

# File lib/pdf/reader/object_hash.rb, line 185
def has_key?(check_key)
  # TODO update from O(n) to O(1)
  each_key do |key|
    if check_key.kind_of?(PDF::Reader::Reference)
      return true if check_key == key
    else
      return true if check_key.to_i == key.id
    end
  end
  return false
end
Also aliased as: include?, key?, member?, value?
has_value?(value) click to toggle source

return true if the specifiedvalue exists in the file

# File lib/pdf/reader/object_hash.rb, line 202
def has_value?(value)
  # TODO update from O(n) to O(1)
  each_value do |obj|
    return true if obj == value
  end
  return false
end
include?(check_key) click to toggle source
Alias for: has_key?
key?(check_key) click to toggle source
Alias for: has_key?
keys() click to toggle source

return an array of all keys in the file

# File lib/pdf/reader/object_hash.rb, line 217
def keys
  ret = []
  each_key { |k| ret << k }
  ret
end
length() click to toggle source
Alias for: size
member?(check_key) click to toggle source
Alias for: has_key?
obj_type(ref) click to toggle source

returns the type of object a ref points to

# File lib/pdf/reader/object_hash.rb, line 51
def obj_type(ref)
  self[ref].class.to_s.to_sym
rescue
  nil
end
object(key) click to toggle source

If key is a PDF::Reader::Reference object, lookup the corresponding object in the PDF and return it. Otherwise return key untouched.

# File lib/pdf/reader/object_hash.rb, line 95
def object(key)
  key.is_a?(PDF::Reader::Reference) ? self[key] : key
end
Also aliased as: deref
page_references() click to toggle source

returns an array of PDF::Reader::References. Each reference in the array points a Page object, one for each page in the PDF. The first reference is page 1, second reference is page 2, etc.

Useful for apps that want to extract data from specific pages.

# File lib/pdf/reader/object_hash.rb, line 253
def page_references
  root  = fetch(trailer[:Root])
  @page_references ||= get_page_objects(root[:Pages]).flatten
end
sec_handler?() click to toggle source
# File lib/pdf/reader/object_hash.rb, line 262
def sec_handler?
  !!sec_handler
end
size() click to toggle source

return the number of objects in the file. An object with multiple generations is counted once.

# File lib/pdf/reader/object_hash.rb, line 171
def size
  xref.size
end
Also aliased as: length
stream?(ref) click to toggle source

returns true if the supplied references points to an object with a stream

# File lib/pdf/reader/object_hash.rb, line 58
def stream?(ref)
  self.has_key?(ref) && self[ref].is_a?(PDF::Reader::Stream)
end
to_a() click to toggle source

return an array of arrays. Each sub array contains a key/value pair.

# File lib/pdf/reader/object_hash.rb, line 239
def to_a
  ret = []
  each do |id, obj|
    ret << [id, obj]
  end
  ret
end
to_s() click to toggle source
# File lib/pdf/reader/object_hash.rb, line 211
def to_s
  "<PDF::Reader::ObjectHash size: #{self.size}>"
end
value?(check_key) click to toggle source
Alias for: has_key?
values() click to toggle source

return an array of all values in the file

# File lib/pdf/reader/object_hash.rb, line 225
def values
  ret = []
  each_value { |v| ret << v }
  ret
end
values_at(*ids) click to toggle source

return an array of all values from the specified keys

# File lib/pdf/reader/object_hash.rb, line 233
def values_at(*ids)
  ids.map { |id| self[id] }
end

[Validate]

Generated with the Darkfish Rdoc Generator 2.