The similarity is calculated by moving a window of a specified length along the aligned sequences. Within the window, the similarity of any one position is taken to be the average of all the possible pairwise scores of the bases or residues at that position. The pairwise scores are taken from the specified similarity matrix. The average of the position similarities within the window is plotted.
The program is useful for determining where the quality of alignments is good or bad.
The average similarity is calculated by:
Av. Sim. = sum( Mij*wi + Mji*wj ) ------------------- (Nseq*Wsize)*((Nseq-1)*Wsize)
sum - over column*window size
w - sequence weighting
M - matrix comparison table
i,j - with respect to residue i or j
Nseq - number of sequences in the alignment
Wsize - window size
This program is useful for gaining a qualitative insight into where there are regions of conservation in a group of aligned sequences.
Note that you should only compare the results of two runs of plotcon if you use the same window size in each. This is because the 'similarity score' units that are output are very sensitive to the size of the window. A large window (e.g. 100) gives a nice, smooth curve, and very low 'similarity score' units, whereas a small window (e.g. 4) gives a very spikey, noisy plot with 'similarity score' units of a round 1.00
|