|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.biojavax.bio.seq.io.RichSequenceFormat.BasicFormat
org.biojavax.bio.seq.io.INSDseqFormat
public class INSDseqFormat
Format reader for INSDseq files. This version of INSDseq format will generate and write RichSequence objects. Loosely Based on code from the old, deprecated, org.biojava.bio.seq.io.GenbankXmlFormat object. Understands http://www.ebi.ac.uk/embl/Documentation/DTD/INSDC_V1.4.dtd.txt Does NOT understand the "sites" keyword in INSDReference_position. Interprets this instead as an empty location. This is because there is no obvious way of representing the "sites" keyword in BioSQL. Note also that the INSDInterval tags and associate stuff are not read, as this is duplicate information to the INSDFeature_location tag which is already fully parsed. However, they are written on output, although there is no guarantee that the INSDInterval tags will exactly match the INSDFeature_location tag as it is not possible to exactly reflect its contents using these.
Nested Class Summary | |
---|---|
static class |
INSDseqFormat.Terms
Implements some INSDseq-specific terms. |
Nested classes/interfaces inherited from interface org.biojavax.bio.seq.io.RichSequenceFormat |
---|
RichSequenceFormat.BasicFormat, RichSequenceFormat.HeaderlessFormat |
Constructor Summary | |
---|---|
INSDseqFormat()
|
Method Summary | |
---|---|
void |
beginWriting()
Informs the writer that we want to start writing. |
boolean |
canRead(BufferedInputStream stream)
Check to see if a given stream is in our format. A stream is in INSDseq format if the second XML line contains the phrase "http://www.ebi.ac.uk/dtd/INSD_INSDSeq.dtd". |
boolean |
canRead(File file)
Check to see if a given file is in our format. Some formats may be able to determine this by filename, whilst others may have to open the file and read it to see what format it is in. A file is in INSDseq format if the second XML line contains the phrase "http://www.ebi.ac.uk/dtd/INSD_INSDSeq.dtd". |
void |
finishWriting()
Informs the writer that are done writing. |
String |
getDefaultFormat()
getDefaultFormat returns the String identifier for
the default sub-format written by a SequenceFormat
implementation. |
SymbolTokenization |
guessSymbolTokenization(BufferedInputStream stream)
On the assumption that the stream is readable by this format (not checked), attempt to guess which symbol tokenization we should use to read it. Always returns a DNA tokenizer. |
SymbolTokenization |
guessSymbolTokenization(File file)
On the assumption that the file is readable by this format (not checked), attempt to guess which symbol tokenization we should use to read it. For formats that only accept one tokenization, just return it without checking the file. For formats that accept multiple tokenizations, its up to you how you do it. Always returns a DNA tokenizer. |
boolean |
readRichSequence(BufferedReader reader,
SymbolTokenization symParser,
RichSeqIOListener rlistener,
Namespace ns)
Reads a sequence from the given buffered reader using the given tokenizer to parse sequence symbols. |
boolean |
readSequence(BufferedReader reader,
SymbolTokenization symParser,
SeqIOListener listener)
Read a sequence and pass data on to a SeqIOListener. |
void |
writeSequence(Sequence seq,
Namespace ns)
Writes a sequence out to the outputstream given by beginWriting() using the default format of the implementing class. Namespace is ignored as INSDseq has no concept of it. |
void |
writeSequence(Sequence seq,
PrintStream os)
writeSequence writes a sequence to the specified
PrintStream, using the default format. |
void |
writeSequence(Sequence seq,
String format,
PrintStream os)
writeSequence writes a sequence to the specified
PrintStream , using the specified format. |
Methods inherited from class org.biojavax.bio.seq.io.RichSequenceFormat.BasicFormat |
---|
getElideComments, getElideFeatures, getElideReferences, getElideSymbols, getLineWidth, getPrintStream, setElideComments, setElideFeatures, setElideReferences, setElideSymbols, setLineWidth, setPrintStream |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final String INSDSEQ_FORMAT
protected static final String INSDSEQS_GROUP_TAG
protected static final String INSDSEQ_TAG
protected static final String LOCUS_TAG
protected static final String LENGTH_TAG
protected static final String TOPOLOGY_TAG
protected static final String STRANDED_TAG
protected static final String MOLTYPE_TAG
protected static final String DIVISION_TAG
protected static final String UPDATE_DATE_TAG
protected static final String CREATE_DATE_TAG
protected static final String UPDATE_REL_TAG
protected static final String CREATE_REL_TAG
protected static final String DEFINITION_TAG
protected static final String DATABASE_XREF_TAG
protected static final String XREF_TAG
protected static final String ACCESSION_TAG
protected static final String ACC_VERSION_TAG
protected static final String SECONDARY_ACCESSIONS_GROUP_TAG
protected static final String SECONDARY_ACCESSION_TAG
protected static final String OTHER_SEQIDS_GROUP_TAG
protected static final String OTHER_SEQID_TAG
protected static final String KEYWORDS_GROUP_TAG
protected static final String KEYWORD_TAG
protected static final String SOURCE_TAG
protected static final String ORGANISM_TAG
protected static final String TAXONOMY_TAG
protected static final String REFERENCES_GROUP_TAG
protected static final String REFERENCE_TAG
protected static final String REFERENCE_LOCATION_TAG
protected static final String REFERENCE_POSITION_TAG
protected static final String TITLE_TAG
protected static final String JOURNAL_TAG
protected static final String PUBMED_TAG
protected static final String XREF_DBNAME_TAG
protected static final String XREF_ID_TAG
protected static final String REMARK_TAG
protected static final String AUTHORS_GROUP_TAG
protected static final String AUTHOR_TAG
protected static final String CONSORTIUM_TAG
protected static final String COMMENT_TAG
protected static final String FEATURES_GROUP_TAG
protected static final String FEATURE_TAG
protected static final String FEATURE_KEY_TAG
protected static final String FEATURE_LOC_TAG
protected static final String FEATURE_INTERVALS_GROUP_TAG
protected static final String FEATURE_INTERVAL_TAG
protected static final String FEATURE_FROM_TAG
protected static final String FEATURE_TO_TAG
protected static final String FEATURE_POINT_TAG
protected static final String FEATURE_ISCOMP_TAG
protected static final String FEATURE_INTERBP_TAG
protected static final String FEATURE_ACCESSION_TAG
protected static final String FEATURE_OPERATOR_TAG
protected static final String FEATURE_PARTIAL5_TAG
protected static final String FEATURE_PARTIAL3_TAG
protected static final String FEATUREQUALS_GROUP_TAG
protected static final String FEATUREQUAL_TAG
protected static final String FEATUREQUAL_NAME_TAG
protected static final String FEATUREQUAL_VALUE_TAG
protected static final String SEQUENCE_TAG
protected static final String CONTIG_TAG
protected static final Pattern dbxp
protected static final Pattern xmlSchema
Constructor Detail |
---|
public INSDseqFormat()
Method Detail |
---|
public boolean canRead(File file) throws IOException
canRead
in interface RichSequenceFormat
canRead
in class RichSequenceFormat.BasicFormat
file
- the File
to check.
IOException
- in case the file is inaccessible.public SymbolTokenization guessSymbolTokenization(File file) throws IOException
guessSymbolTokenization
in interface RichSequenceFormat
guessSymbolTokenization
in class RichSequenceFormat.BasicFormat
file
- the File
object to guess the format of.
SymbolTokenization
to read the file with.
IOException
- if the file is unrecognisable or inaccessible.public boolean canRead(BufferedInputStream stream) throws IOException
stream
- the BufferedInputStream
to check.
IOException
- in case the stream is inaccessible.public SymbolTokenization guessSymbolTokenization(BufferedInputStream stream) throws IOException
stream
- the BufferedInputStream
object to guess the format of.
SymbolTokenization
to read the stream with.
IOException
- if the stream is unrecognisable or inaccessible.public boolean readSequence(BufferedReader reader, SymbolTokenization symParser, SeqIOListener listener) throws IllegalSymbolException, IOException, ParseException
reader
- The stream of data to parse.symParser
- A SymbolParser defining a mapping from
character data to Symbols.listener
- A listener to notify when data is extracted
from the stream.
IllegalSymbolException
- if it is not possible to
translate character data from the stream into valid BioJava
symbols.
IOException
- if an error occurs while reading from the
stream.
ParseException
public boolean readRichSequence(BufferedReader reader, SymbolTokenization symParser, RichSeqIOListener rlistener, Namespace ns) throws IllegalSymbolException, IOException, ParseException
reader
- the input sourcesymParser
- the tokenizer which understands the sequence being readrlistener
- the listener to send sequence events tons
- the namespace to read sequences into.
IllegalSymbolException
- if the tokenizer couldn't understand one of the
sequence symbols in the file.
IOException
- if there was a read error.
ParseException
public void beginWriting() throws IOException
IOException
- if writing fails.public void finishWriting() throws IOException
IOException
- if writing fails.public void writeSequence(Sequence seq, PrintStream os) throws IOException
writeSequence
writes a sequence to the specified
PrintStream, using the default format.
seq
- the sequence to write out.os
- the printstream to write to.
IOException
public void writeSequence(Sequence seq, String format, PrintStream os) throws IOException
writeSequence
writes a sequence to the specified
PrintStream
, using the specified format.
seq
- a Sequence
to write out.format
- a String
indicating which sub-format
of those available from a particular
SequenceFormat
implemention to use when
writing.os
- a PrintStream
object.
IOException
- if an error occurs.public void writeSequence(Sequence seq, Namespace ns) throws IOException
seq
- the sequence to writens
- the namespace to write it with
IOException
- in case it couldn't write somethingpublic String getDefaultFormat()
getDefaultFormat
returns the String identifier for
the default sub-format written by a SequenceFormat
implementation.
String
.
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |