This is a simple measure that quantifies how far the codon usage of a gene departs from equal usage of synonymous codons. This measure of synonymous codon usage bias, the 'effective number of codons used in a gene', Nc, can be easily calculated from codon usage data alone, and is independent of gene length and amino acid (aa) composition. Nc can take values from 20, in the case of extreme bias where one codon is exclusively used for each aa, to 61 when the use of alternative synonymous codons is equally likely. Nc thus provides an intuitively meaningful measure of the extent of codon preference in a gene.
The Nc statistic has problems in very short sequences (20 amino acids or less) which are yet to be fully resolved. They are caused by the need to consider amino acids which are missing in the sequence.
This calculation was originally in the EGCG package as "codfish" (codon usage for fission yeast). As Frank Wright is a vegan, we looked for a meat-free name for the EMBOSS version, "chips". The official explanation is "Codon Heterozygosity (Inverse of) in a Protein-coding Sequence"
If the sequence extends beyond the coding region then the start and/or end positions of the CDS must be provided because chips analyses exclusively protein coding regions.
|
The codon usage table is by default the file "CODONS/Ehum.cut" in the EMBOSS distribution directory.