seqmatchall

Function

Description

This takes a set of sequences and does an all-against-all pairwise comparison of words (fragments of the sequences of a specified fixed size) in the sequences, finding regions of identity between any two sequences.

The larger the specified word size, the faster the comparison will proceed. Regions whose stretches of identity are shorter than the word size will be missed. You should therefore choose a word size that is small enough to find those regions of similarity you are interested in within a reasonable time-frame.

Usage

Command line arguments


Input file format

seqmatchall reads a set of sequence USAs.

The sequences must be either all protein or all nucleic acid.

Output file format

ECLAC (the complete E.coli lac operon) matches ECLACI ECLACZ ECLACY and ECLACA (the individual genes), and there is a short overlap between ECLACY and the flanking genes ECLACZ and ECLACA

The output is a list of regions of identity in pairs of sequences, each consisting of one line with 7 columns of data separated by TABs or space characters.

The columns of data consist of:

Data files

None.

Notes

The larger the word size, the faster the comparisons will proceed, but regions of identitly smaller than the word size will not be reported.

References

None.

Warnings

None.

Diagnostic Error Messages

None.

Exit status

It exits with a status of 0.

Known bugs

None.

polydot will give a graphical view of the same matches.

Author(s)

History

Target users

Comments