profit

Function

Description

profit takes a simple frequency matrix produced by prophecy and searches with this to find matches in the input sequence(s) you are searching.

Scores for the matches are calculated from the simple frequency matrix. It is the sum of scores at each position of the matrix.

A 'simple frequency matrix' is simply a count of the number of times any particular amino acid occurs at each position in the alignment used to create it. Simple frequency matrices are created using the program prophecy with the option '-type F' to create the correct type of matrix. The alignment should not have gaps in it.

The resulting matrix is moved to each position in the sequence(s) you are searching. At each position in the sequence, the frequencies of the amino acids or bases covered by the length of the matrix is read from the matrix. The sum of these frequencies at each position of the matrix is the score for that position of the sequence. If this score is above the threshold percentage of the maximum possible score for that matrix, then a hit is reported.

Usage

Before running the example, we need to make a simple frequency matrix using prophecy

This is the ungapped aligned set of sequences used to make the matrix:

% more m.seq
>one
DEVGGEALGRLLVVYPWTQR
>two
DEVGREALGRLLVVYPWTQR
>three
DEVGGEALGRILVVYPWTQR
>four
DEVGGEAAGRVLVVYPWTQR

% prophecy Creates matrices/profiles from multiple alignments Input sequence set: m.seq Profile type F : Frequency G : Gribskov H : Henikoff Select type [F]: Enter a name for the profile [mymatrix]: Enter threshold reporting percentage [75]: Output file [outfile.prophecy]:

Command line arguments


Input file format

profit reads a simple frequency matrix produced by prophecy and uses it to search searches one or more protein or nucleic acid sequence USAs.

Output file format

The ouput is a list of three columns.

The first column is the name of the matching sequence found.
The second is the start position in the sequence of the match.
The third column (after the word 'Percentage:') is the percentage of the maximum possible score (sum of the highest value at each position in the frequency matrix).

Data files

None.

Notes

None.

References

None.

Warnings

The aligned set of sequences used to make the simple frquency matrix should not have gaps in it. profit will let you use a matrix made from a gapped alignment, but the results will probably not be sensible.

Diagnostic Error Messages

None.

Exit status

It always exits with a status of 0.

Known bugs

None.

Author(s)

History

Target users

Comments