transeq

Function

Description

Transeq translates nucleic acid sequences to the corresponding peptide sequence.

It can translate in any of the 3 forward or three reverse sense frames, or in all three forward or reverse frames, or in all six frames.

It can translate specified regions corresponding to the coding regions of your sequences.

It can translate using the standard ('Universal') genetic code and also with a selection of non-standard codes.

Termination (STOP) codons are translated as the character '*'.

The output peptide sequence is always in the standard one-letter IUPAC code.

Usage

Command line arguments


Input file format

transeq reads one or more nucleic acid sequence USAs.

Output file format

One or more peptide sequences are written out.

The names of the resulting protein sequences are formed from the name of the input nucleic acid sequence with '_' and the translation frame appended to it. Thus a nucleic acid sequence with the name 'XYZ' franslated in all 6 frame would produce protein sequences with the names: 'XYZ_1', 'XYZ_2', 'XYZ_3', 'XYZ_4', 'XYZ_5', 'XYZ_6'.

If regions are specified, they are taken to be translated in frame 1 and so the output name would be 'XYZ_1'.

Data files

Notes

The reverse frame '-1' is defined as the translation you get when you use the reverse-complement of the sequence with the same codon phase as the codon in frame '1'.

Thus the sequence ACTGG in frame 1 is the translation of the codons ACT,GG; the translation of frame -1 uses these same codons, reverse complemented:

  forward sense          ACT GG
  reverse sense          TGA CC

  reverse-complement     CC AGT
  frame -1 translation       S

Frame -1 is the translation of CCAGT (the reverse complement of ACTGG) using the codon 'AGT' (the first bases 'CC' are ignored). The result is the peptide 'S'.

Similarly frame -2 is the phase used by frame 2, 'CAG T' (the first base 'C' is ignored). The last base cannot be successfully translated and is output as the unknown residue 'X'. The result is the peptide 'QX'.

Frame -3 is the phase used by frame 3, 'CCA GT'. The last two bases will translate to 'V' as it does not matter what the next base is. (GTA, GTC, GTG, GTT all code for 'V'). The result is the peptide 'PV'.

The alternative way of generating the reverse translation frames used by some people is that frame -1 is made by taking the frame '1' of the reverse complement. There is no correspondence between the codons used in frame 1 and -1, 2 and -2, 3 and -3; the codons used change with the length modulus 3.

There does not appear to be a convention on which definition to use.
The Staden package uses the same convention as this program.
The GCG package sneakily avoids the problem by naming the frames using letters (a, b, c, d, e, f)

If you really need to define frame -1 as the frame given when you reverse complement the sequence and then start translating at the first frame in the resulting sequence, then use the '-alternative' qualifier.

References

None.

Warnings

When translating using non-standard genetic code table, always check the table carefully for deviations from your particular organism's code.

When using the '-regions' option, you should always leave the '-frames' option at the default of frame '1'. If you change the frame while specifying a region to translate, then the regions will be offset by 1 or 2 bases, which is not what you want.

Diagnostic Error Messages

Several warning messages about malformed region specifications:

Exit status

It exits with status 0, unless a region is badly constructed.

Known bugs

When using the '-regions' option, you should always leave the '-frames' option at the default of frame '1'. If you change the frame while specifying a region to translate, then the regions will be offset by 1 or 2 bases, which is not what you want.

Author(s)

History

Target users

Comments