class Bio::PubMed

Description

The Bio::PubMed class provides several ways to retrieve bibliographic information from the PubMed database at

http://www.ncbi.nlm.nih.gov/sites/entrez?db=PubMed

Basically, two types of queries are possible:

The different methods within the same group are interchangeable and should return the same result.

Additional information about the MEDLINE format and PubMed programmable APIs can be found on the following websites:

Usage

require 'bio'

# If you don't know the pubmed ID:
Bio::PubMed.esearch("(genome AND analysis) OR bioinformatics").each do |x|
  p x
end

Bio::PubMed.search("(genome AND analysis) OR bioinformatics").each do |x|
  p x
end

# To retrieve the MEDLINE entry for a given PubMed ID:
puts Bio::PubMed.efetch("10592173", "14693808")
puts Bio::PubMed.query("10592173")
puts Bio::PubMed.pmfetch("10592173")

# This can be converted into a Bio::MEDLINE object:
manuscript = Bio::PubMed.query("10592173")
medline = Bio::MEDLINE.new(manuscript)

Public Class Methods

efetch(*args) click to toggle source
# File lib/bio/io/pubmed.rb, line 204
def self.efetch(*args)
  self.new.efetch(*args)
end
esearch(*args) click to toggle source
# File lib/bio/io/pubmed.rb, line 200
def self.esearch(*args)
  self.new.esearch(*args)
end
pmfetch(*args) click to toggle source
# File lib/bio/io/pubmed.rb, line 216
def self.pmfetch(*args)
  self.new.pmfetch(*args)
end
query(*args) click to toggle source
# File lib/bio/io/pubmed.rb, line 212
def self.query(*args)
  self.new.query(*args)
end

Public Instance Methods

efetch(ids, hash = {}) click to toggle source

Retrieve PubMed entry by PMID and returns MEDLINE formatted string using entrez efetch. Multiple PubMed IDs can be provided:

Bio::PubMed.efetch(123)
Bio::PubMed.efetch([123,456,789])

Arguments:

  • ids: list of PubMed IDs (required)

  • hash: hash of E-Utils options

    • retmode: “xml”, “html”, …

    • rettype: “medline”, …

    • retmax: integer (default 100)

    • retstart: integer

    • field

    • reldate

    • mindate

    • maxdate

    • datetype

Returns

Array of MEDLINE formatted String

Calls superclass method Bio::NCBI::REST#efetch
# File lib/bio/io/pubmed.rb, line 117
def efetch(ids, hash = {})
  opts = { "db" => "pubmed", "rettype"  => "medline" }
  opts.update(hash)
  result = super(ids, opts)
  if !opts["retmode"] or opts["retmode"] == "text"
    result = result.split(/\n\n+/)
  end
  result
end
esearch(str, hash = {}) click to toggle source

Search the PubMed database by given keywords using E-Utils and returns an array of PubMed IDs.

For information on the possible arguments, see eutils.ncbi.nlm.nih.gov/entrez/query/static/esearch_help.html#PubMed


Arguments:

  • str: query string (required)

  • hash: hash of E-Utils options

    • retmode: “xml”, “html”, …

    • rettype: “medline”, …

    • retmax: integer (default 100)

    • retstart: integer

    • field

    • reldate

    • mindate

    • maxdate

    • datetype

Returns

array of PubMed IDs or a number of results

Calls superclass method Bio::NCBI::REST#esearch
# File lib/bio/io/pubmed.rb, line 93
def esearch(str, hash = {})
  opts = { "db" => "pubmed" }
  opts.update(hash)
  super(str, opts)
end
pmfetch(id) click to toggle source

Retrieve PubMed entry by PMID and returns MEDLINE formatted string using entrez pmfetch.


Arguments:

Returns

MEDLINE formatted String

# File lib/bio/io/pubmed.rb, line 183
def pmfetch(id)
  host = "www.ncbi.nlm.nih.gov"
  path = "/entrez/utils/pmfetch.fcgi?tool=bioruby&mode=text&report=medline&db=PubMed&id="

  ncbi_access_wait

  http = Bio::Command.new_http(host)
  response = http.get(path + CGI.escape(id.to_s))
  result = response.body
  if result =~ /#{id}\s+Error/
    raise( result )
  else
    result = result.gsub("\r", "\n").squeeze("\n").gsub(/<\/?pre>/, '')
    return result
  end
end
query(*ids) click to toggle source

Retrieve PubMed entry by PMID and returns MEDLINE formatted string using entrez query.


Arguments:

Returns

MEDLINE formatted String

# File lib/bio/io/pubmed.rb, line 153
def query(*ids)
  host = "www.ncbi.nlm.nih.gov"
  path = "/sites/entrez?tool=bioruby&cmd=Text&dopt=MEDLINE&db=PubMed&uid="
  list = ids.collect { |x| CGI.escape(x.to_s) }.join(",")

  ncbi_access_wait

  http = Bio::Command.new_http(host)
  response = http.get(path + list)
  result = response.body
  result = result.scan(/<pre>\s*(.*?)<\/pre>/m).flatten

  if result =~ /id:.*Error occurred/
    # id: xxxxx Error occurred: Article does not exist
    raise( result )
  else
    if ids.size > 1
      return result
    else
      return result.first
    end
  end
end