The EMBOSS distribution comes loaded with a set of codon usage tables. Thes codon usage tables provided with the distribution are calculated from the files in ftp://ftp.ebi.ac.uk/pub/databases/codonusage/README), with a few additions whose exact derivation cannot easily be determined. Many people would prefer to create their own from the public CUTG data.
You run cutgextract on the CUTG database from ftp://ftp.ebi.ac.uk/pub/databases/cutg. You should get all the required *.codon files from CUTG, and uncompress them if they are compressed before running cutgextract on them.
The task of downloading the CUTG database and running cutgextract to create the codon usage table files from it would normally be done only once when the EMBOSS package is being installled or if a new version of the CUTG database is released.
Note by the way that CUTG has a drawback: it has a table for each organism without making the distinction between different gene populations.
It then parses out the codon usage data from these *.codon files and writes one file per species into the EMBOSS data/CODONS directory. The names of the files are derived from the species names in the CUTG files. These files names will be long (and therefore descriptive).
|