sigscan documentation


 

CONTENTS

1.0 SUMMARY
2.0 INPUTS & OUTPUTS
3.0 INPUT FILE FORMAT
4.0 OUTPUT FILE FORMAT
5.0 DATA FILES
6.0 USAGE
7.0 KNOWN BUGS & WARNINGS
8.0 NOTES
9.0 DESCRIPTION
10.0 ALGORITHM
11.0 RELATED APPLICATIONS
12.0 DIAGNOSTIC ERROR MESSAGES
13.0 AUTHORS
14.0 REFERENCES



1.0 SUMMARY

Generates a DHF (domain hits file) of hits (sequences) from scanning a signature against a sequence database. Generate hits (DHF file) from a signature search


2.0 INPUTS & OUTPUTS

SIGSCAN reads a signature from a protein signature file, scans the signature against a protein sequence database and generates a DHF file (domain hits file) of hits to database sequences and a DAF file (domain alignment file) of corresponding signature-sequence alignments. The names of the signature file, DHF file and DAF file are provided by the user. The user specifies a maximum number of high-scoring hits that will be generated.


3.0 INPUT FILE FORMAT

The format of the signature file is described in SIGGEN documentation.

Input files for usage example

File: ../siggen-keep/54894.sig

TY   SCOP
XX
TS   1D
XX
CL   Alpha and beta proteins (a+b)
XX
FO   Ferredoxin-like
XX
SF   Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain
XX
FA   Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain
XX
SI   54894
XX
NP   15
XX
NN   [1]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   H ; 2
XX
GA   12 ; 2
XX
NN   [2]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   P ; 2
XX
GA   1 ; 2
XX
NN   [3]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   P ; 2
XX
GA   26 ; 2
XX
NN   [4]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   F ; 2
XX
GA   16 ; 2
XX
NN   [5]
XX


  [Part of this file has been deleted for brevity]

XX
GA   4 ; 2
XX
NN   [10]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   D ; 2
XX
GA   2 ; 2
XX
NN   [11]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   N ; 2
XX
GA   0 ; 2
XX
NN   [12]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   Y ; 2
XX
GA   0 ; 2
XX
NN   [13]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   G ; 2
XX
GA   3 ; 2
XX
NN   [14]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   P ; 2
XX
GA   3 ; 2
XX
NN   [15]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   P ; 2
XX
GA   2 ; 2
//

File: swsmall

> Q9WVI4
DDVTMLFSDIVGFTAICAQCTPMQVISMLNELYTRFDHQCGFLDIYKVETIGDAYCVASG
LHRKSLCHAKPIALMALKMMELSEEVLTPDGRPIQMRIGIHSGSVLAGVVGVRMPRYCLF
GNNVTLASKFESGSHPRRINISPTTYQLL
> Q9ERL9
VTMLFSDIVGFTAICSQCSPLQVITMLNALYTRFDQQCGELDVYKVETIGDAYCVAGGLH
RESDTHAVQIALMALKMMELSNEVMSPHGEPIKMRIGLHSGSVFAGVVGVKMPRYCLFGN
NVTLANKFESCSVPRKINVSPTTYRLLKDCPG
> Q9DGG6
EQVSILFADIVGFTKMSANKSAHALVGLLNDLFGRFDRLCEDTKCEKISTLGDCYYCVAG
CPEPRADHAYCCIEMGLGMIKAIEQFCQEKKEMVNMRVGVHTGTVLCGILGMRRFKFDVW
SNDVNLANLMEQLGVAGKVHISEATAKYLDDRYEMEDGKVTERVGQSAVADQLKGLKTYL
I
> Q99396
KELADPVTLIFTDIESSTAQWATQPELMPDAVATHHSMVRSLIENYDCYEVKTVGDSFMI
ACKSPFAAVQLAQELQLRFLRLDWGTTVFDEFYREFEERHAEEGDGKYKPPTARLDPEVY
RQLWNGLRVRVGIHTGLCDIRYDEVTKGYDYYGQTANTAARTESVGNGGQVLMTCETYHS
LSTAERSQFDVTPLGGVPLRGVSEPVEVYQLN
> Q99280
NDSAPKEPTGPVTLIFTDIESSTALWAAHPDLMPDAVATHHRLIRSLITRYECYEVKTVG
DSFMIASKSPFAAVQLAQELQLRFLRLDWETNALDESYREFEEQRAEGECEYTPPTAHMD
PEVYSRLWNGLRVRVGIHTGLCDIRYDEVTKGYDYYGRTSNMAARTESVANGGQVLMTHA
AYMSLSGEDRNQLDVTTLGATVLRGVPEPVRMYQLN
> Q99279
NNNRAPKEPTDPVTLIFTDIESSTALWAAHPDLMPDAVAAHHRMVRSLIGRYKCYEVKTV
GDSFMIASKSPFAAVQLAQELQLCFLHHDWGTNALDDSYREFEEQRAEGECEYTPPTAHM
DPEVYSRLWNGLRVRVGIHTGLCDIIRHDEVTKGYDYYGRTPNMAARTESVANGGQVLMT
HAAYMSLSAEDRKQIDVTALGDVALRGVSDPVKMYQLN
> Q91WF3
VCVLFASVPDFKEFYSESNINHEGLECLRLLNEIIADFDELLSKPKFSGVEKIKTIGSTY
MAATGLNATSGQDTQQDSERSCSHLGTMVEFAVALGSKLGVINKHSFNNFRLRVGLNHGP
VVAGVIGAQKPQYDIWGNTVNVASRMESTGVLGKIQVTEETARAL
> Q91WF3
FHSLYVKRHQGVSVLYADIVGFTRLASECSPKELVLMLNELFGKFDQIAKEHECMRIKIL
GDCYYCVSGLPLSLPDHAINCVRMGLDMCRAIRKLRVATGVDINMRVGVHSGSVLCGVIG
LQKWQYDVWSHDVTLANHMEAGGVPGRVHITGATLALL
> Q8VHH7
NNFMLRIGMNKGGVLAGVIGARKPHYDIWGNTVNVASRMESTGVMGNIQVVEET
> Q8VHH7
FNTMYMYRHENVSILFADIVGFTQLSSACSAQELVKLLNELFARFDKLAAKYHQLRIKIL
GDCYYCICGLPDYREDHAVCSILMGLAMVEAISYVREKTKTGVDMRVGVHTGTVLGGVLG
QKRWQYDVWSTDVTVANKMEAGGIPGRVHISQSTMDCLKGEFDVEPGDGGSRCDYLDEKG
IETYLI
> Q8NFM4
VCVLFASVPDFKEFYSESNINHEGLECLRLLNEIIADFDELLSKPKFSGVEKIKTIGSTY
MAATGLNATSGQDAQQDAERSCSHLGTMVEFAVALGSKLDVINKHSFNNFRLRVGLNHGP
VVAGVIGAQKPQYDIWGNTVNVASRMESTGVLGKIQVTEET
> Q8NFM4
FHSLYVKRHQGVSVLYADIVGFTRLASECSPKELVLMLNELFGKFDQIAKEHECMRIKIL
GDCYYCVSGLPLSLPDHAINCVRMGLDMCRAIRKLRAATGVDINMRVGVHSGSVLCGVIG


  [Part of this file has been deleted for brevity]

> Q83IL8
VEAIKRGTVIDHIPAQIGFKLLSLFKLTETDQRITIGLNLPSGEMGRKDLIKIENTFLSE
EQVDQLALYAPQATVNRIDNYEVVGKSRPSLP
> Q7P144
VEALKQGTVIDHIPAGEGVKILRLFKLTETGERVTVGLNLVSRHMGSKDLIKVENVALTE
EQANELALFAPKATVNVIDNFEVVKKHKLTLP
> Q7MZ14
VEAIRCGTVIDHIPAQVGFKLLSLFKLTETDQRITIGLNLPSNRLGKKDLIKIENTFLTE
QQANQLAMYAPNATVNCIENYEVVKKLPINLP
> Q7MX57
VAAIRNGIVIDHIPPTKLFKVATLLQLDDLDKRITIGNNLRSRSHGSKGVIKIEDKTFEE
EELNRIALIAPNVRLNIIRDYEVVEKRQVEVP
> Q7MHF0
VEAIKNGTVIDHIPAQVGIKVLKLFDMHNSSQRVTIGLNLPSSALGNKDLLKIENVFINE
EQASKLALYAPHATVNQIEDYQVVKKLALELP
> Q58801
VKKITNGTVIDHIDAGKALMVFKVLNVPKETSVMIAINVPSKKKGKKDILKIEGIELKKE
DVDKISLISPDVTINIIRNGKVVEKLKPQIP
> P96175
VEAICNGYVIDHIPSGQGVKILRLFSLTDTKQRVTVGFNLPSHDGTTKDLIKVENTEITK
SQANQLALLAPNATVNIIENFKVTDKHSLALP
> P96111
GIKPIENGTVIDHIAKGKTPEEIYSTILKIRKILRLYDVDSADGIFRSSDGSFKGYISLP
DRYLSKKEIKKLSAISPNTTVNIIKNSTVVEKYRIKLP
> P77919
VSAIKEGTVIDHIPAGKGLKVIEILKLGKLTNGGAVLLAMNVPSKKLGRKDIVKVEGRFL
SEEEVNKIALVAPNATVNIIRDYKVVEKFKVEVP
> P74766
VSKIKNGTVIDHIPAGRAFAVLNVLGIKGHEGFRIALVINVDSKKMGKKDIVKIEDKEIS
DTEANLITLIAPTATINIVREYEVVKKTKLEVP
> P57451
VEAIKSGSVIDHIPEYIGFKLLSLFRFTETEKRITIGLNLPSKKLGRKDIIKIENTFLSD
EQINQLAIYAPHATVNYINEYNLVRKVFPTLP
> P19936
VEAIKCGTVIDHIPAQIGFKLLTLFKLTATDQRITIGLNLPSNELGRKDLIKIENTFLTE
QQANQLAMYAPKATVNRIDNYEVVRKLTLSLP
> P08421
VEAIKCGTVIDHIPAQVGFKLLSLFKLTETDQRITIGLNLPSGEMGRKDLIKIENTFLTE
EQVNQLALYAPQATVNRIDNYDVVGKSRPSLP
> P00478
VEAIKRGTVIDHIPAQIGFKLLSLFKLTETDQRITIGLNLPSGEMGRKDLIKIENTFLSE
DQVDQLALYAPQATVNRIDNYEVVGKSRPSLP
> O58452
VSAIKEGTVIDHIPAGKGLKVIEILGLSKLSNGGSVLLAMNVPSKKLGRKDIVKVEGKFL
SEEEVNKIALVAPTATVNIIRNYKVVEKFKVEVP
> O30129
VSKIKEGTVIDHINAGKALLVLKILKIQPGTDLTVSMAMNVPSSKMGKKDIVKVEGMFIR
DEELNKIALISPNATINLIRDYEIERKFKVSPP
> O26938
VKPIKNGTVIDHITANRSLNVLNILGLPDGRSKVTVAMNMDSSQLGSKDIVKIENRELKP
SEVDQIALIAPRATINIVRDYKIVEKAKVRL




4.0 OUTPUT FILE FORMAT

DHF file (domain hits file)
The format of the DHF file (domain hits file) of hit sequences generated by SIGSCAN (Figure 1) is described fully in SEQSEARCH documentation and only summarised here. The file contains two lines per hit, the first is a description of the hit in 16 text tokens delimited by '^'. The second line contains the protein sequence. The first 4 tokens refer to the hit (sequence) itself, the tokens are
The next 9 tokens refer to the domain family, superfamily etc for which the signature was derived and are as follows:
The next 4 tokens refer to the hit, specifically, information about the search result as follows:

DAF file (domain alignment file)
The format of the DAF file (domain alignment file, Figure 2) generated by SIGSCAN is described fully in DOMAINALIGN documentation and is only summarised here.
It conforms to EMBOSS "simple" multiple sequence alignment format and includes domain classification records (in comment lines beginning with '#') for the node for which the signature was generated. The classification records are TY (domain type, either SCOP or CATH), CL (class), FO (fold), SF (superfamily) and FA (family). For CATH domains, AR (architecture) and TP (topology) may also be given. A unique identifier for the node is given after SI.
There are multiple blocks that contain the accession numbers, positions and aligned sequences. An accession number is given for each hit. The positions are the start and end residue positions of the appropriate section of sequence. The sequence uses '-' as a gap character. A 'SIGNATURE' line is given as a markup line underneath the sequence (signature positions are marked with a '*').

Output files for usage example

File: SIGSCAN.dhf

> P00478^.^10^91^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^4.67^0.000e+00^0.000e+00
VEAIKRGTVIDHIPAQIGFKLLSLFKLTETDQRITIGLNLPSGEMGRKDLIKIENTFLSEDQVDQLALYAPQATVNRIDNYEVVGKSRPSLP
> Q83IL8^.^10^91^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^4.40^0.000e+00^0.000e+00
VEAIKRGTVIDHIPAQIGFKLLSLFKLTETDQRITIGLNLPSGEMGRKDLIKIENTFLSEEQVDQLALYAPQATVNRIDNYEVVGKSRPSLP
> Q8Z130^.^10^91^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^4.07^0.000e+00^0.000e+00
VEAIKCGTVIDHIPAQVGFKLLSLFKLTETDQRITIGLNLPSGEMGRKDLIKIENTFLTDEQVNQLALYAPQATVNRIDNYDVVGKSRPSLP
> P08421^.^10^91^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^4.07^0.000e+00^0.000e+00
VEAIKCGTVIDHIPAQVGFKLLSLFKLTETDQRITIGLNLPSGEMGRKDLIKIENTFLTEEQVNQLALYAPQATVNRIDNYDVVGKSRPSLP
> Q8K9H8^.^10^91^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^3.67^0.000e+00^0.000e+00
VEAIKSGSVIDHIPAHIGFKLLSLFRFTETEKRITIGLNLPSQKLDKKDIIKIENTFLSDDQINQLAIYAPCATVNYIEKYNLVGKIFPSLP
> Q9HKM3^.^11^92^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^3.20^0.000e+00^0.000e+00
ISKIRDGTVIDHVPSGKGIRVIGVLGVHEDVNYTVSLAIHVPSNKMGFKDVIKIENRFLDRNELDMISLIAPNATISIIKNYEISEKFQVELP
> Q9HHN3^.^11^92^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^2.87^0.000e+00^0.000e+00
VSKIQAGTVIDHIPAGQALQVLQILGTNGASDDQITVGMNVTSERHHRKDIVKIEGRELSQDEVDVLSLIAPDATINIVRDYEVDEKRRVDRP
> P57451^.^10^91^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^2.80^0.000e+00^0.000e+00
VEAIKSGSVIDHIPEYIGFKLLSLFRFTETEKRITIGLNLPSKKLGRKDIIKIENTFLSDEQINQLAIYAPHATVNYINEYNLVRKVFPTLP
> Q8ZB38^.^10^91^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^2.73^0.000e+00^0.000e+00
VEAIKCGTVIDHIPAQIGFKLLSLFKLTATDQRITIGLNLPSKRSGRKDLIKIENTFLTEQQANQLAMYAPDATVNRIDNYEVVKKLTLSLP
> P19936^.^10^91^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^2.73^0.000e+00^0.000e+00
VEAIKCGTVIDHIPAQIGFKLLTLFKLTATDQRITIGLNLPSNELGRKDLIKIENTFLTEQQANQLAMYAPKATVNRIDNYEVVRKLTLSLP
> Q97B28^.^11^92^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^2.60^0.000e+00^0.000e+00
ISKIKDGTVIDHIPSGKALRVLSILGIRDDVDYTVSVGMHVPSSKMEYKDVIKIENRSLDKNELDMISLTAPNATISIIKNYEISEKFKVELP
> Q87LF7^.^10^91^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^2.53^0.000e+00^0.000e+00
VEAIKNGTVIDHIPAQIGIKVLKLFDMHNSSQRVTIGLNLPSSALGHKDLLKIENVFINEEQASKLALYAPHATVNQIENYEVVKKLALELP
> Q7MZ14^.^10^91^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^2.47^0.000e+00^0.000e+00
VEAIRCGTVIDHIPAQVGFKLLSLFKLTETDQRITIGLNLPSNRLGKKDLIKIENTFLTEQQANQLAMYAPNATVNCIENYEVVKKLPINLP
> Q8ZTG2^.^11^92^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^2.33^0.000e+00^0.000e+00
VSKIENGTVIDHIPAGRALTVLRILGISGKEGLRVALVMNVESKKLGKKDIVKIEGRELTPEEVNIISAVAPTATINIIRNFAVVKKFKVTPP
> Q9KP65^.^10^91^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^2.33^0.000e+00^0.000e+00
VEAIKNGTVIDHIPAKVGIKVLKLFDMHNSAQRVTIGLNLPSSALGSKDLLKIENVFISEAQANKLALYAPHATVNQIENYEVVKKLALQLP
> O30129^.^11^92^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^2.27^0.000e+00^0.000e+00
VSKIKEGTVIDHINAGKALLVLKILKIQPGTDLTVSMAMNVPSSKMGKKDIVKVEGMFIRDEELNKIALISPNATINLIRDYEIERKFKVSPP
> Q8D1W6^.^10^91^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^2.27^0.000e+00^0.000e+00
VEAIFGGTVIDHIPAQVGLKLLSLFKWLHTKERITMGLNLPSNQQKKKDLIKLENVLLNEDQANQLSIYAPLATVNQIKNYIVIKKQKLKLP
> Q7MHF0^.^10^91^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^2.20^0.000e+00^0.000e+00
VEAIKNGTVIDHIPAQVGIKVLKLFDMHNSSQRVTIGLNLPSSALGNKDLLKIENVFINEEQASKLALYAPHATVNQIEDYQVVKKLALELP
> Q8DCF7^.^10^91^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^2.20^0.000e+00^0.000e+00
VEAIKNGTVIDHIPAQVGIKVLKLFDMHNSSQRVTIGLNLPSSALGNKDLLKIENVFINEEQASKLALYAPHATVNQIEDYQVVKKLALELP
> Q9UX07^.^11^92^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^2.07^0.000e+00^0.000e+00
VSKIRNGTVIDHIPAGRALAVLRILGIRGSEGYRVALVMNVESKKIGRKDIVKIEDRVIDEKEASLITLIAPSATINIIRDYVVTEKRHLEVP
> Q9K1K9^.^10^91^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^1.93^0.000e+00^0.000e+00
VEAIEKGTVIDHIPAGRGLTILRQFKLLHYGNAVTVGFNLPSKTQGSKDIIKIKGVCLDDKAADRLALFAPEAVVNTIDNFKVVQKRHLNLP
> O58452^.^12^93^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^1.93^0.000e+00^0.000e+00
VSAIKEGTVIDHIPAGKGLKVIEILGLSKLSNGGSVLLAMNVPSKKLGRKDIVKVEGKFLSEEEVNKIALVAPTATVNIIRNYKVVEKFKVEVP
> P74766^.^11^92^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^1.93^0.000e+00^0.000e+00
VSKIKNGTVIDHIPAGRAFAVLNVLGIKGHEGFRIALVINVDSKKMGKKDIVKIEDKEISDTEANLITLIAPTATINIVREYEVVKKTKLEVP
> Q7P144^.^10^91^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^1.80^0.000e+00^0.000e+00
VEALKQGTVIDHIPAGEGVKILRLFKLTETGERVTVGLNLVSRHMGSKDLIKVENVALTEEQANELALFAPKATVNVIDNFEVVKKHKLTLP
> Q9JWY6^.^10^91^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^1.60^0.000e+00^0.000e+00
VEAIEKGTVIDHIPAGRGLTILRQFKLLHYGNAVTVGFNLPSKTQGSKDIIKIKGVCLDDKAADRLALFAPEAVVNTIDHFKVVQKRHLNLP


  [Part of this file has been deleted for brevity]

VQAEAFDSVTIYFSDIVGFTALSAESTPMQVVTLLNDLYTCFDAVIDNFDVYKVETIGDAYMVVSGLPVRNGRLHACEVARMALALLDAVRSFRIRHRPQEQLRLRIGIHTGPVCAGVVGLKMPRYCLFGDTVNTASRMESNGEALKIHLSSETKAVLEEFGGFELEL
> Q891I9^.^9^90^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^0.07^0.000e+00^0.000e+00
ITSIKDGIVIDHIKSGYGIKIFNYLNLKNVEYSVALIMNVFSSKLGKKDIIKIANKEIDIDFTVLGLIDPTITINIIEDEKIKEKLNLELP
> P18293^.^38^119^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^0.07^0.000e+00^0.000e+00
VQAEAFDSVTIYFSDIVGFTALSAESTPMQVVTLLNDLYTCFDAVIDNFDVYKVETIGDAYMVVSGLPVRNGQLHAREVARMALALLDAVRSFRIRHRPQEQLRLRIGIHTGPVCAGVVGLKMPRYCLFGDTVNTASRMESNGEALRIHLSSETKAVLEEFDGFELEL
> O02740^.^32^113^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^0.07^0.000e+00^0.000e+00
DLVTLYFSDIVGFTTISAMSEPIEVVDLLNDLYTLFDAIIGSHDVYKVETIGDAYMVASGLPKRNGMRHAAEIANMSLDILSSVGTFKMRHMPEVPVRIRIGLHSGPVVAGVVGLTMPRYCLFGDTVNTASRMESTGLPYRIHVSHSTVTILRTLGEGYEVE
> P51841^.^32^113^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^0.07^0.000e+00^0.000e+00
DLVTLYFSDIVGFTTISAMSEPIEVVDLLNDLYTLFDAIIGSHDVYKVETIGDAYMVASGLPKRNGSRHAAEIANMSLDILSSVGTFKMRHMPEVPVRIRIGLHSGPVVAGVVGLTMPRYCLFGDTVNTASRMESTGLPYRIHVSLSTVTILQNLSEGYEVE
> Q8NFM4^.^49^130^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^0.07^0.000e+00^0.000e+00
VCVLFASVPDFKEFYSESNINHEGLECLRLLNEIIADFDELLSKPKFSGVEKIKTIGSTYMAATGLNATSGQDAQQDAERSCSHLGTMVEFAVALGSKLDVINKHSFNNFRLRVGLNHGPVVAGVIGAQKPQYDIWGNTVNVASRMESTGVLGKIQVTEET
> P46197^.^38^119^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^0.07^0.000e+00^0.000e+00
VQAEAFDSVTIYFSDIVGFTALSAESTPMQVVTLLNDLYTCFDAIIDNFDVYKVETIGDAYMVVSGLPGRNGQRHAPEIARMALALLDAVSSFRIRHRPHDQLRLRIGVHTGPVCAGVVGLKMPRYCLFGDTVNTASRMESNGQALKIHVSSTTKDALDELGCFQLEL
> P19686^.^65^146^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^0.00^0.000e+00^0.000e+00
VQAKKFNEVTMLFSDIVGFTAICSQCSPLQVITMLNALYTRFDQQCGELDVYKVETIGDAYCVAGGLHRESDTHAVQIALMALKMMELSNEVMSPHGEPIKMRIGLHSGSVFAGVVGVKMPRYCLFGNNVTLANKFESCSVPRKINVSPTTYRLLKDCPG
> P19687^.^66^147^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^0.00^0.000e+00^0.000e+00
AVQAKRFGNVTMLFSDIVGFTAICSQCSPLQVITMLNALYTRFDRQCGELDVYKVETIGDAYCVAGGLHKESDTHAVQIALMALKMMELSHEVVSPHGEPIKMRIGLHSGSVFAGVVGVKMPRYCLFGNNVTLANKFESCSVPRKINVSPTTYRLLKDCPG
> O60503^.^60^141^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^0.00^0.000e+00^0.000e+00
VSILFADIVGFTKMSANKSAHALVGLLNDLFGRFDRLCEETKCEKISTLGDCYYCVAGCPEPRADHAYCCIEMGLGMIKAIEQFCQEKKEMVNMRVGVHTGTVLCGILGMRRFKFDVWSNDVNLANLMEQLGVAGKVHISEATAKYLDDRYEMEDGKVIERLGQSVVADQLKGLKTYLI
> P97490^.^75^156^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^0.00^0.000e+00^0.000e+00
DAVGVMFASIPGFADFYSQTEMNNQGVECLRLLNEIIADFDELLGEDRFQDIEKIKTIGSTYMAVSGLSPEKQQCEDKWGHLCALADFSLALTESIQEINKHSFNNFELRIGISHGSVVAGVIGAKKPQYDIWGKTVNLASRMDSTGVSGRIQVPEETYLIL
> P51830^.^60^141^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^0.00^0.000e+00^0.000e+00
VSILFADIVGFTKMSANKSAHALVGLLNDLFGRFDRLCEQTKCEKISTLGDCYYCVAGCPEPRADHAYCCIEMGLGMIKAIEQFCQEKKEMVNMRVGVHTGTVLCGILGMRRFKFDVWSNDVNLANLMEQLGVAGKVHISEATAKYLDDRYEMEDGRVIERLGQSVVADQLKGLKTYLI
> Q02108^.^65^146^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^0.00^0.000e+00^0.000e+00
VQAKKFSNVTMLFSDIVGFTAICSQCSPLQVITMLNALYTRFDQQCGELDVYKVETIGDAYCVAGGLHKESDTHAVQIALMALKMMELSDEVMSPHGEPIKMRIGLHSGSVFAGVVGVKMPRYCLFGNNVTLANKFESCSVPRKINVSPTTYRLLKDCPG
> Q9DGG6^.^62^143^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^0.00^0.000e+00^0.000e+00
EQVSILFADIVGFTKMSANKSAHALVGLLNDLFGRFDRLCEDTKCEKISTLGDCYYCVAGCPEPRADHAYCCIEMGLGMIKAIEQFCQEKKEMVNMRVGVHTGTVLCGILGMRRFKFDVWSNDVNLANLMEQLGVAGKVHISEATAKYLDDRYEMEDGKVTERVGQSAVADQLKGLKTYLI
> Q9ERL9^.^57^138^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^0.00^0.000e+00^0.000e+00
VTMLFSDIVGFTAICSQCSPLQVITMLNALYTRFDQQCGELDVYKVETIGDAYCVAGGLHRESDTHAVQIALMALKMMELSNEVMSPHGEPIKMRIGLHSGSVFAGVVGVKMPRYCLFGNNVTLANKFESCSVPRKINVSPTTYRLLKDCPG
> P30803^.^0^81^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^0.00^0.000e+00^0.000e+00
VAVMFASIANFSEFYVELEANNEGVECLRVLNEIIADFDEIISEDRFRQLEKIKTIGSTYMAASGLNDSTYDKVGKTHIKALADFAMKLMDQMKYINEHSFNNFQMKIGLNIGPVVAGVIGARKPQYDIWGNTVNVASRMDSTGVPDRIQVTTDMYQVL
> P40145^.^75^156^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^0.00^0.000e+00^0.000e+00
DAVGVMFASIPGFADFYSQTEMNNQGVECLRLLNEIIADFDELLGEDRFQDIEKIKTIGSTYMAVSGLSPEKQQCEDKWGHLCALADFSLALTESIQEINKHSFNNFELRIGISHGSVVAGVIGAKKPQYDIWGKTVNLASRMDSTGVSGRIQVPEETYLIL
> O19179^.^30^111^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^0.00^0.000e+00^0.000e+00
VTLYFSDIVGFTTISAMSEPIEVVDLLNDLYTLFDAIIGSHDVYKVETIGDAYMVASGLPQRNGQRHAAEIANMALDILSAVGSFRMRHMPEVPVRIRIGLHSGPCVAGVVGLTMPRYCLFGDTVNTASRMESTGLPYRIHVNMSTVRILHALDEGFQTEV
> P51840^.^30^111^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^0.00^0.000e+00^0.000e+00
VTLYFSDIVGFTTISAMSEPIEVVDLLNDLYTLFDAIIGSHDVYKVETIGDAYMVASGLPQRNGQRHAAEIANMSLDILSAVGSFRMRHMPEVPVRIRIGLHSGPCVAGVVGLTMPRYCLFGDTVNTASRMESTGLPYRIHVNMSTVRIL
> P40146^.^75^156^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^0.00^0.000e+00^0.000e+00
DAVGVMFASIPGFADFYSQTEMNNQGVECLRLLNEIIADFDELLGEDRFQDIEKIKTIGSTYMAVSGLSPEKQQCEDKWGHLCALADFSLALTESIQEINKHSFNNFELRIGISHGSVVAGVIGAKKPQYDIWGKTVNLASRMDSTGVSGRIQVPEETYLIL
> P52785^.^30^111^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^0.00^0.000e+00^0.000e+00
VTLYFSDIVGFTTISAMSEPIEVVDLLNDLYTLFDAIIGAHDVYKVETIGDAYMVASGLPQRNGQRHAAEIANMSLDILSAVGSFRMRHMPEVPVRIRIGLHSGPCVAGVVGLTMPRYCLFGDTVNTASRMESTGLPYRIHVNMSTVRILRSLDQGFQME
> P98999^.^62^143^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^0.00^0.000e+00^0.000e+00
EQVSILFADIVGFTKMSANKSAHALVGLLNDLFGRFDRLCEETKCEKISTLGDCYYCVAGCPEPRPDHAYCCIEMGLGMIEAIDQFCQEKKEMVNMRVGVHTGTVLCGILGMRRFKFDVWSNDVNLANLMEQLGVAGKVHISEKTARYLD
> P30804^.^0^81^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^-0.07^0.000e+00^0.000e+00
VAVMFASIANFSEFYVELEANNEGVECLRLLNEIIADFDEIISEERFRQLEKIKTIGSTYMAASGLNASTYDQAGRSHITALADYAMRLMEQMKHINEHSFNNFQMKIGLNMGPVVAGVIGARKPQYDIWGNTVNVSSRMDSTGVPDRIQVTTDLYQVL
> Q01341^.^0^81^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^-0.07^0.000e+00^0.000e+00
VAVMFASIANFSEFYVELEANNEGVECLRLLNEIIADFDEIISEERFRQLEKIKTIGSTYMAASGLNASTYDQVGRSHITALADYAMRLMEQMKHINEHSFNNFQMKIGLNMGPVVAGVIGARKPQYDIWGNTVNVSSRMDSTGVPDRIQVTTDLYQVL
> Q03343^.^0^81^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^-0.07^0.000e+00^0.000e+00
VAVMFASIANFSEFYVELEANNEGVECLRLLNEIIADFDEIISEERFRQLEKIKTIGSTYMAASGLNASTYDQVGRSHITALADYAMRLMEQMKHINEHSFNNFQMKIGLNMGPVVAGVIGARKPQYDIWGNTVNVSSRMDSTGVPDRIQVTTDLYQVL
> O95622^.^0^81^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^-0.07^0.000e+00^0.000e+00
VAVMFASIANFSEFYVELEANNEGVECLRLLNEIIADFDEIISEDRFRQLEKIKTIGSTYMAASGLNDSTYDKVGKTHIKALADFAMKLMDQMKYINEHSFNNFQMKIGLNIGPVVAGVIGARKPQYDIWGNTVNVASRMDSTGVPDRIQVTTDMYQVL

File: SIGSCAN.aln

# DE   Results of signature search
# XX
# TY   SCOP
# XX
# CL   Alpha and beta proteins (a+b)
# XX
# FO   Ferredoxin-like
# XX
# SF   Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain
# XX
# FA   Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain
# XX
# SI   54894
# XX
P00478    1      VEAIKRGTVIDHIPAQIGFKLLSLFKLTETDQRITIGLNLPSGEMGRKDLIKI 53
SIGNATURE -      ----------*-*--------------------------*-------------
P00478    54     ENTFLSEDQVDQLALYAPQATVNRIDNYEVVGKSRPSLP               106
SIGNATURE -      ---*---*--*----*-*----*--***---*---*--*              
P00478    107    .                                                     159
SIGNATURE -      .                                                    
P00478    160    .                                                     212
SIGNATURE -      .                                                    
P00478    213    .                                                     265
SIGNATURE -      .                                                    
# XX
Q83IL8    1      VEAIKRGTVIDHIPAQIGFKLLSLFKLTETDQRITIGLNLPSGEMGRKDLIKI 53
SIGNATURE -      ----------*-*--------------------------*-------------
Q83IL8    54     ENTFLSEEQVDQLALYAPQATVNRIDNYEVVGKSRPSLP               106
SIGNATURE -      ---*---*--*----*-*----*--***---*---*--*              
Q83IL8    107    .                                                     159
SIGNATURE -      .                                                    
Q83IL8    160    .                                                     212
SIGNATURE -      .                                                    
Q83IL8    213    .                                                     265
SIGNATURE -      .                                                    
# XX
Q8Z130    1      VEAIKCGTVIDHIPAQVGFKLLSLFKLTETDQRITIGLNLPSGEMGRKDLIKI 53
SIGNATURE -      ----------*-*--------------------------*-------------
Q8Z130    54     ENTFLTDEQVNQLALYAPQATVNRIDNYDVVGKSRPSLP               106
SIGNATURE -      ---*---*--*----*-*----*--***---*---*--*              
Q8Z130    107    .                                                     159
SIGNATURE -      .                                                    
Q8Z130    160    .                                                     212
SIGNATURE -      .                                                    
Q8Z130    213    .                                                     265
SIGNATURE -      .                                                    
# XX
P08421    1      VEAIKCGTVIDHIPAQVGFKLLSLFKLTETDQRITIGLNLPSGEMGRKDLIKI 53
SIGNATURE -      ----------*-*--------------------------*-------------
P08421    54     ENTFLTEEQVNQLALYAPQATVNRIDNYDVVGKSRPSLP               106


  [Part of this file has been deleted for brevity]

SIGNATURE -      ---------*-*--------------------------*--------------
P98999    107    CGILGMRRFKFDVWSNDVNLANLMEQLGVAGKVHISEKTARYLD          159
SIGNATURE -      --*---*--*----*-*----*--***---*---*--*------         
P98999    160    .                                                     212
SIGNATURE -      .                                                    
P98999    213    .                                                     265
SIGNATURE -      .                                                    
# XX
P30804    1      VAVMFASIANFSEFYVELEANNEGVECLRLLNEIIADFDEIISEERFRQLEKI 53
SIGNATURE -      *-*--------------------------*----------------*---*--
P30804    54     KTIGSTYMAASGLNASTYDQAGRSHITALADYAMRLMEQMKHINEHSFNNFQM 106
SIGNATURE -      *----*-*----*--***---*---*--*------------------------
P30804    107    KIGLNMGPVVAGVIGARKPQYDIWGNTVNVSSRMDSTGVPDRIQVTTDLYQVL 159
SIGNATURE -      -----------------------------------------------------
P30804    160    .                                                     212
SIGNATURE -      .                                                    
P30804    213    .                                                     265
SIGNATURE -      .                                                    
# XX
Q01341    1      VAVMFASIANFSEFYVELEANNEGVECLRLLNEIIADFDEIISEERFRQLEKI 53
SIGNATURE -      *-*--------------------------*----------------*---*--
Q01341    54     KTIGSTYMAASGLNASTYDQVGRSHITALADYAMRLMEQMKHINEHSFNNFQM 106
SIGNATURE -      *----*-*----*--***---*---*--*------------------------
Q01341    107    KIGLNMGPVVAGVIGARKPQYDIWGNTVNVSSRMDSTGVPDRIQVTTDLYQVL 159
SIGNATURE -      -----------------------------------------------------
Q01341    160    .                                                     212
SIGNATURE -      .                                                    
Q01341    213    .                                                     265
SIGNATURE -      .                                                    
# XX
Q03343    1      VAVMFASIANFSEFYVELEANNEGVECLRLLNEIIADFDEIISEERFRQLEKI 53
SIGNATURE -      *-*--------------------------*----------------*---*--
Q03343    54     KTIGSTYMAASGLNASTYDQVGRSHITALADYAMRLMEQMKHINEHSFNNFQM 106
SIGNATURE -      *----*-*----*--***---*---*--*------------------------
Q03343    107    KIGLNMGPVVAGVIGARKPQYDIWGNTVNVSSRMDSTGVPDRIQVTTDLYQVL 159
SIGNATURE -      -----------------------------------------------------
Q03343    160    .                                                     212
SIGNATURE -      .                                                    
Q03343    213    .                                                     265
SIGNATURE -      .                                                    
# XX
O95622    1      VAVMFASIANFSEFYVELEANNEGVECLRLLNEIIADFDEIISEDRFRQLEKI 53
SIGNATURE -      *-*--------------------------*----------------*---*--
O95622    54     KTIGSTYMAASGLNDSTYDKVGKTHIKALADFAMKLMDQMKYINEHSFNNFQM 106
SIGNATURE -      *----*-*----*--***---*---*--*------------------------
O95622    107    KIGLNIGPVVAGVIGARKPQYDIWGNTVNVASRMDSTGVPDRIQVTTDMYQVL 159
SIGNATURE -      -----------------------------------------------------
O95622    160    .                                                     212
SIGNATURE -      .                                                    
O95622    213    .                                                     265
SIGNATURE -      .                                                    




5.0 DATA FILES

SIGSCAN requires a residue substitution matrix.


6.0 USAGE

   Standard (Mandatory) qualifiers:
  [-siginfile]         infile     This option specifies the name of the
                                  signature file (input). A 'signature file'
                                  contains a sparse sequence signature
                                  suitable for use with the SIGSCAN and
                                  SIGSCANLIG programs. The files are generated
                                  by using SIGGEN and SIGGENLIG.
  [-dbsequence]        seqall     This option specifies the name of the
                                  database to search.
   -sub                matrixf    This option specifies the residue
                                  substitution matrix.
   -gapo               float      This option specifies the gap insertion
                                  penalty. The gap insertion penalty is the
                                  score taken away when a gap is created. The
                                  best value depends on the choice of
                                  comparison matrix. The default value assumes
                                  you are using the EBLOSUM62 matrix for
                                  protein sequences, and the EDNAMAT matrix
                                  for nucleotide sequences.
   -gape               float      This option specifies the gap extension
                                  penalty. The gap extension penalty is added
                                  to the standard gap penalty for each base or
                                  residue in the gap. This is how long gaps
                                  are penalized. Usually you will expect a few
                                  long gaps rather than many short gaps, so
                                  the gap extension penalty should be lower
                                  than the gap penalty.
   -nterm              menu       This option specifies the N-terminal
                                  matching option. This determines how the
                                  first signature position is aligned to a
                                  sequence from the database.
   -nhits              integer    This option specifies the maximum number of
                                  hits to output.
  [-hitsfile]          outfile    This option specifies the name of the DHF
                                  file (domain hits file) (output). A 'domain
                                  hits file' contains database hits
                                  (sequences) with domain classification
                                  information, in the DHF format (FASTA-like).
                                  The hits are relatives to a SCOP or CATH
                                  family (or other node in the structural
                                  hierarchies) and are found from a search of
                                  a sequence database, in this case, by using
                                  SIGSCAN. Files containing hits retrieved by
                                  PSIBLAST are generated by using SEQSEARCH or
                                  various types of HMM and profile by using
                                  LIBSCAN.
  [-alignfile]         outfile    Name of SAF file (signature alignment file)
                                  for signature-sequence alignments (output)
                                  help: "This option specifies the name of the
                                  SAF (signature alignment file) (output).A
                                  'signature alignment file' contains one or
                                  more signnature-sequence alignments. The
                                  file is in DAF format (CLUSTAL-like) and is
                                  annotated with bibliographic information,
                                  either the domain family classification (for
                                  SIGSCAN output) or ligand classification
                                  (for SIGSCANLIG output). The files generated
                                  by SIGSCAN will contain a
                                  signature-sequence alignment for a single
                                  signature against a library of one or more
                                  sequences. The files generated by using
                                  SIGSCANLIG will contain a signature-sequence
                                  alignment for a single query sequence
                                  against a library of one or more signatures.

   Additional (Optional) qualifiers: (none)
   Advanced (Unprompted) qualifiers: (none)
   Associated qualifiers:

   "-dbsequence" associated qualifiers
   -sbegin2            integer    Start of each sequence to be used
   -send2              integer    End of each sequence to be used
   -sreverse2          boolean    Reverse (if DNA)
   -sask2              boolean    Ask for begin/end/reverse
   -snucleotide2       boolean    Sequence is nucleotide
   -sprotein2          boolean    Sequence is protein
   -slower2            boolean    Make lower case
   -supper2            boolean    Make upper case
   -sformat2           string     Input sequence format
   -sdbname2           string     Database name
   -sid2               string     Entryname
   -ufo2               string     UFO features
   -fformat2           string     Features format
   -fopenfile2         string     Features file name

   "-hitsfile" associated qualifiers
   -odirectory3        string     Output directory

   "-alignfile" associated qualifiers
   -odirectory4        string     Output directory

   General qualifiers:
   -auto               boolean    Turn off prompts
   -stdout             boolean    Write standard output
   -filter             boolean    Read standard input, write standard output
   -options            boolean    Prompt for standard and additional values
   -debug              boolean    Write debug output to program.dbg
   -verbose            boolean    Report some/full command line options
   -help               boolean    Report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   -warning            boolean    Report warnings
   -error              boolean    Report errors
   -fatal              boolean    Report fatal errors
   -die                boolean    Report deaths

6.1 COMMAND LINE ARGUMENTS

Standard (Mandatory) qualifiers Allowed values Default
[-siginfile]
(Parameter 1)
This option specifies the name of the signature file (input). A 'signature file' contains a sparse sequence signature suitable for use with the SIGSCAN and SIGSCANLIG programs. The files are generated by using SIGGEN and SIGGENLIG. Input file Required
[-dbsequence]
(Parameter 2)
This option specifies the name of the database to search. Readable sequence(s) Required
-sub This option specifies the residue substitution matrix. Comparison matrix file in EMBOSS data path EBLOSUM62
-gapo This option specifies the gap insertion penalty. The gap insertion penalty is the score taken away when a gap is created. The best value depends on the choice of comparison matrix. The default value assumes you are using the EBLOSUM62 matrix for protein sequences, and the EDNAMAT matrix for nucleotide sequences. Floating point number from 1.0 to 100.0 10.0 for any sequence
-gape This option specifies the gap extension penalty. The gap extension penalty is added to the standard gap penalty for each base or residue in the gap. This is how long gaps are penalized. Usually you will expect a few long gaps rather than many short gaps, so the gap extension penalty should be lower than the gap penalty. Floating point number from 0.0 to 10.0 0.5 for any sequence
-nterm This option specifies the N-terminal matching option. This determines how the first signature position is aligned to a sequence from the database.
1 (Align anywhere and allow only complete signature-sequence fit)
2 (Align anywhere and allow partial signature-sequence fit)
3 (Use empirical gaps only)
1
-nhits This option specifies the maximum number of hits to output. Any integer value 100
[-hitsfile]
(Parameter 3)
This option specifies the name of the DHF file (domain hits file) (output). A 'domain hits file' contains database hits (sequences) with domain classification information, in the DHF format (FASTA-like). The hits are relatives to a SCOP or CATH family (or other node in the structural hierarchies) and are found from a search of a sequence database, in this case, by using SIGSCAN. Files containing hits retrieved by PSIBLAST are generated by using SEQSEARCH or various types of HMM and profile by using LIBSCAN. Output file SIGSCAN.dhf
[-alignfile]
(Parameter 4)
Name of SAF file (signature alignment file) for signature-sequence alignments (output) help: "This option specifies the name of the SAF (signature alignment file) (output).A 'signature alignment file' contains one or more signnature-sequence alignments. The file is in DAF format (CLUSTAL-like) and is annotated with bibliographic information, either the domain family classification (for SIGSCAN output) or ligand classification (for SIGSCANLIG output). The files generated by SIGSCAN will contain a signature-sequence alignment for a single signature against a library of one or more sequences. The files generated by using SIGSCANLIG will contain a signature-sequence alignment for a single query sequence against a library of one or more signatures. Output file SIGSCAN.aln
Additional (Optional) qualifiers Allowed values Default
(none)
Advanced (Unprompted) qualifiers Allowed values Default
(none)

6.2 EXAMPLE SESSION

An example of interactive use of sigscan is shown below. Here is a sample session with sigscan


% sigscan 
Generate hits (DHF file) from a signature search.
Name of signature file (input): ../siggen-keep/54894.sig
Name of database to search.: swsmall
Residue substitution matrix [EBLOSUM62]: 
Gap insertion penalty [10]: 
Gap extension penalty [0.5]: 
N-terminal matching options
         1 : Align anywhere and allow only complete signature-sequence fit
         2 : Align anywhere and allow partial signature-sequence fit
         3 : Use empirical gaps only
Select number [1]: 
Max. number of hits to output [100]: 
Name of DHF file (domain hits file) (output) [SIGSCAN.dhf]: 
Name of SAF file (signature alignment file) for signature-sequence alignments (output) help: 
"This option specifies the name of the SAF (signature alignment file) (output).A 'signature alignment file' contains one or more signnature-sequence alignments. The file is in DAF format (CLUSTAL-like) and is annotated with bibliographic information, either the domain family classification (for SIGSCAN output) or ligand classification (for SIGSCANLIG output). The files generated by SIGSCAN will contain a signature-sequence alignment for a single signature against a library of one or more sequences. The files generated by using SIGSCANLIG will contain a signature-sequence alignment for a single query sequence against a library of one or more signatures. [SIGSCAN.aln]: 


Signature file read ok
Signature compiled ok
Signature aligned to db ok
Hits file written ok
Alignments file written ok

Go to the input files for this example
Go to the output files for this example




7.0 KNOWN BUGS & WARNINGS

None.


8.0 NOTES

SIGSCAN does not generate p-values or E-values. DHF files of hits for which p-values or E-values are calculated may be generated by using LIBSCAN . LIBSCAN provides searches for sparse protein signatures as well as various types of hidden Markov models and other profiles.

In the case a signature file is generated by hand, it is essential that the gap data given is listed in order of increasing gap size (see SIGGEN documentation ).

8.1 GLOSSARY OF FILE TYPES

FILE TYPE FORMAT DESCRIPTION CREATED BY SEE ALSO
Domain hits file DHF format (FASTA-like). Database hits (sequences) with domain classification information. The hits are relatives to a SCOP or CATH family (or other node in the structural hierarchies) and are found from a search of a discriminating element (e.g. a protein signature, hidden Markov model, simple frequency matrix, Gribskov profile or Hennikoff profile) against a sequence database. SEQSEARCH (hits retrieved by PSIBLAST). SIGSCAN (hits retrieved by sparse protein signature). LIBSCAN (hits retrieved by various types of HMM and profile). N.A.
Domain alignment file DAF format (CLUSTAL-like). Sequence alignment of domains belonging to the same SCOP or CATH family (or other node in the structural hierarchies). The file is annotated with domain family classification information. DOMAINALIGN (structure-based sequence alignment of domains of known structure). DOMAINALIGN alignments can be extended with sequence relatives (of unknown structure) to the family in question by using SEQALIGN.
Hits file Text file of classified hits A list of hits (e.g. from a prediction method) that are classified and rank-ordered on the basis of score, p-value, E-value etc. ROCON and LIBSCAN (hits from searches of a discriminating element (hidden Markov model, profile or signature) against a sequence database). ROCPLOT is run on the files to perform Receiver Operator Characteristic (ROC) analysis on the hits.
Signature file SIG format Contains a sparse sequence signature suitable for use with the SIGSCAN program. Contains a sparse sequence signature. SIGGEN, SIGGENLIG, LIBGEN The files are generated by using SIGGEN.
None


9.0 DESCRIPTION

See Blades et al., Ison et al. and Daniel et al. for a description of protein signatures and their application.


10.0 ALGORITHM

The algorithm is based on approach first described in Daniel et al (1999) that was applied to the definition of protein families (Ison et al, 2000) and later to automatically-generated signatures (Blades et al, 2005).


11.0 RELATED APPLICATIONS

See also

Program nameDescription
contactcountCount specific versus non-specific contacts
contactsGenerate intra-chain CON files from CCF files
domainalignGenerate alignments (DAF file) for nodes in a DCF file
domainrepReorder DCF file to identify representative structures
domainresoRemove low resolution domains from a DCF file
interfaceGenerate inter-chain CON files from CCF files
libgenGenerate discriminating elements from alignments
matgen3dGenerate a 3D-1D scoring matrix from CCF files
psiphiPhi and psi torsion angles from protein coordinates
roconGenerates a hits file from comparing two DHF files
rocplotPerforms ROC analysis on hits files
scorecmapdirContact scores for cleaned protein chain contact files
seqalignExtend alignments (DAF file) with sequences (DHF file)
seqfraggleRemoves fragment sequences from DHF files
seqsearchGenerate PSI-BLAST hits (DHF file) from a DAF file
seqsortRemove ambiguous classified sequences from DHF files
seqwordsGenerates DHF files from keyword search of UniProt
siggenGenerates a sparse protein signature from an alignment
siggenligGenerate ligand-binding signatures from a CON file
sigscanligSearch ligand-signature library & write hits (LHF file)



12.0 DIAGNOSTIC ERROR MESSAGES

None.


13.0 AUTHORS

Jon Ison (jison@rfcgr.mrc.ac.uk)
MRC Rosalind Franklin Centre for Genomics Research Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SB, UK


14.0 REFERENCES

Please cite the authors and EMBOSS.

Rice P, Longden I and Bleasby A (2000) "EMBOSS - The European Molecular Biology Open Software Suite" Trends in Genetics, 15:276-278.

See also http://emboss.sourceforge.net/ Automatic generation and evaluation of sparse protein signatures for families of protein structural domains. MJ Blades, JC Ison, R Ranasinghe, and JBC Findlay. Protein Science. 2005 (accepted)

A key residues approach to the definition of protein families and analysis of sparse family signatures. JC Ison, AJ Bleasby, MJ Blades, SC Daniel, JH Parish, JBC Findlay. PROTEINS: Structure, Function & Genetics. 2000, 40:330-341

Alignment of a sparse protein signature with protein sequences: application to fold prediction for three small globulins. SC Daniel, JH Parish, JC Ison, MJ Blades & JBC Findlay. FEBS Letters. 1999, 459:349-352.

14.1 Other useful references