/usr/share/doc/texlive-doc/latex/texshade/AQP1.phd.gz

dwww Home | Show directory contents | Find package

From phd@EMBL-Heidelberg.de Wed Nov 25 10:24:25 1998
Date: Tue, 24 Nov 1998 17:45:25 +0100
From: Protein Prediction <phd@EMBL-Heidelberg.de>
To: eric.beitz@uni-tuebingen.de
Subject: PredictProtein

The following information has been received by the server:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

________________________________________________________________________________

reference predict_h25873 (Tue Nov 24 17:43:21 MET 1998)
from eric.beitz@uni-tuebingen.de
password(###)
resp MAIL
orig HTML
prediction of: -secondary structure   (PHDsec)-solvent accessibility (PHDacc)-
return msf format
# no description
MASEIKKKLFWRAVVAEFLAMTLFVFISIGSALGFNYPLERNQTLVQDNVKVSLAFGLSIATLAQSVGHISGAHSNPAVT
LGLLLSCQISILRAVMYIIAQCVGAIVASAILSGITSSLLENSLGRNDLARGVNSGQGLGIEIIGTLQLVLCVLATTDRR
RRDLGGSAPLAIGLSVALGHLLAIDYTGCGINPARSFGSAVLTRNFSNHWIFWVGPFIGSALAVLIYDFILAPRSSDFTD
RMKVWTSGQVEEYDLDADDINSRVEMKPK

________________________________________________________________________________

Result of PROSITE search (Amos Bairoch):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

please quote: A Bairoch, P Bucher & K Hofmann: The PROSITE database,
its status in 1997. Nucl. Acids Res., 1997, 25, 217-221.

________________________________________________________________________________

--------------------------------------------------------

--------------------------------------------------------

Pattern-ID: ASN_GLYCOSYLATION PS00001 PDOC00001
Pattern-DE: N-glycosylation site
Pattern:    N[^P][ST][^P]
   42       NQTL
   250      NFSN

Pattern-ID: GLYCOSAMINOGLYCAN PS00002 PDOC00002
Pattern-DE: Glycosaminoglycan attachment site
Pattern:    SG.G
   135      SGQG

Pattern-ID: PKC_PHOSPHO_SITE PS00005 PDOC00005
Pattern-DE: Protein kinase C phosphorylation site
Pattern:    [ST].[RK]
   157      TDR
   398      TDR

Pattern-ID: CK2_PHOSPHO_SITE PS00006 PDOC00006
Pattern-DE: Casein kinase II phosphorylation site
Pattern:    [ST].{2}[DE]
   118      SLLE
   383      SRVE

Pattern-ID: MYRISTYL PS00008 PDOC00008
Pattern-DE: N-myristoylation site
Pattern:    G[^EDRKHPFYW].{2}[STAGCN][^P]
   30       GSALGF
   92       GLSIAT
   179      GLLLSC
   288      GAIVAS
   407      GITSSL
   544      GVNSGQ
   722      GLSVAL
   917      GINPAR
   1141     GSALAV

Pattern-ID: PROKAR_LIPOPROTEIN PS00013 PDOC00013
Pattern-DE: Prokaryotic membrane lipoprotein lipid attachment site
Pattern:    [^DERK]{6}[LIVMFWSTAG]{2}[LIVMFYSTAGCQ][AGS]C
   77       PAVTLGLLLSC

Pattern-ID: MIP PS00221 PDOC00193
Pattern-DE: MIP family signature
Pattern:    [HNQA].NP[STA][LIVMF][ST][LIVMF][GSTAFY]
   74       HSNPAVTLG

________________________________________________________________________________

Result of ProDom domain search (Corpet, Gouzy, Kahn):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

- please quote: ELL Sonnhammer & D Kahn, Prot. Sci., 1994, 3, 482-492

________________________________________________________________________________

--- ------------------------------------------------------------
--- Results from running BLAST against PRODOM domains
---
--- PLEASE quote:
---       F Corpet, J Gouzy, D Kahn (1998).  The ProDom database
---       of protein domain families. Nucleic Ac Res 26:323-326.
---
--- BEGIN of BLASTP output
BLASTP 1.4.7 [16-Oct-94] [Build 17:06:52 Oct 31 1994]

Reference:  Altschul, Stephen F., Warren Gish, Webb Miller, Eugene W. Myers,
and David J. Lipman (1990).  Basic local alignment search tool.  J. Mol. Biol.
215:403-10.

Query=  prot (#) ppOld, no description /home/phd/server/work/predict_h25873
        (269 letters)

Database:  /home/phd/ut/prodom/prodom_34_2
           53,597 sequences; 6,740,067 total letters.
Searching..................................................done

                                                                     Smallest
                                                                       Sum
                                                              High  Probability
Sequences producing High-scoring Segment Pairs:              Score  P(N)      N

 390 p34.2 (45) MIP(6) AQP1(4) GLPF(4)  // PROTEIN INTRIN...   270  2.0e-32   1
 45663 p34.2 (1) AQPZ_ECOLI // AQUAPORIN Z.                     90  3.2e-13   2
 45611 p34.2 (1) AQP2_HUMAN // AQUAPORIN-CD (AQP-CD) (WAT...   136  6.0e-13   1
 304 p34.2 (61) AQP2(10) GLPF(6) MIP(5)  // PROTEIN CHANN...   121  9.2e-11   1
 45607 p34.2 (1) PMIP_NICAL // POLLEN-SPECIFIC MEMBRANE I...    80  1.2e-07   2
 45606 p34.2 (1) BIB_DROME // NEUROGENIC PROTEIN BIG BRAIN.     80  1.2e-05   2
 2027 p34.2 (15) GLPF(9) AQP3(2)  // PROTEIN FACILITATOR ...    60  3.4e-05   2
 45615 p34.2 (1) GLPF_STRPN // GLYCEROL UPTAKE FACILITATO...    63  0.024     1
 45638 p34.2 (1) AQP5_HUMAN // AQUAPORIN 5.                     61  0.044     1

>390 p34.2 (45) MIP(6) AQP1(4) GLPF(4)  // PROTEIN INTRINSIC CHANNEL WATER
  AQUAPORIN TONOPLAST MEMBRANE FOR PLASMA LENS
  Length = 88

 Score = 270 (125.3 bits), Expect = 2.0e-32, P = 2.0e-32
 Identities = 47/67 (70%), Positives = 56/67 (83%)

Query:   156 TTDRRRRDLGGSAPLAIGLSVALGHLLAIDYTGCGINPARSFGSAVLTRNFSNHWIFWVG 215
             T D+RR  +GGSAPL IG SVALGHL+ I YTGCG+NPARSFG AV+T NF+NHW++WVG
Sbjct:    22 TDDKRRGSVGGSAPLPIGFSVALGHLIGIPYTGCGMNPARSFGPAVVTGNFTNHWVYWVG 81

Query:   216 PFIGSAL 222
             P IG+ L
Sbjct:    82 PIIGAVL 88

 Score = 95 (44.1 bits), Expect = 2.3e-06, P = 2.3e-06
 Identities = 20/33 (60%), Positives = 23/33 (69%)

Query:   136 GQGLGIEIIGTLQLVLCVLATTDRRRRDLGGSA 168
             GQ L +EIIGT QLV CV ATTD +RR   G +
Sbjct:     1 GQNLVVEIIGTFQLVYCVFATTDDKRRGSVGGS 33

>45663 p34.2 (1) AQPZ_ECOLI // AQUAPORIN Z.
  Length = 96

 Score = 90 (41.8 bits), Expect = 3.2e-13, Sum P(2) = 3.2e-13
 Identities = 18/36 (50%), Positives = 25/36 (69%)

Query:   166 GSAPLAIGLSVALGHLLAIDYTGCGINPARSFGSAV 201
             G AP+AIGL++ L HL++I  T   +NPARS   A+
Sbjct:    25 GFAPIAIGLALTLIHLISIPVTNTSVNPARSTAVAI 60

 Score = 63 (29.2 bits), Expect = 3.2e-13, Sum P(2) = 3.2e-13
 Identities = 11/25 (44%), Positives = 14/25 (56%)

Query:   210 WIFWVGPFIGSALAVLIYDFILAPR 234
             W FWV P +G  +  LIY  +L  R
Sbjct:    71 WFFWVVPIVGGIIGGLIYRTLLEKR 95

>45611 p34.2 (1) AQP2_HUMAN // AQUAPORIN-CD (AQP-CD) (WATER CHANNEL PROTEIN FOR
  RENAL COLLECTING DUCT) (ADH WATER CHANNEL) (AQUAPORIN 2) (COLLECTING DUCT
  WATER CHANNEL PROTEIN) (WCH-CD).
  Length = 49

 Score = 136 (63.1 bits), Expect = 6.0e-13, P = 6.0e-13
 Identities = 23/42 (54%), Positives = 34/42 (80%)

Query:    50 VKVSLAFGLSIATLAQSVGHISGAHSNPAVTLGLLLSCQISI 91
             +++++AFGL I TL Q++GHISGAH NPAVT+  L+ C +S+
Sbjct:     8 LQIAMAFGLGIGTLVQALGHISGAHINPAVTVACLVGCHVSV 49

>304 p34.2 (61) AQP2(10) GLPF(6) MIP(5)  // PROTEIN CHANNEL WATER AQUAPORIN
  INTRINSIC DUCT COLLECTING FOR TONOPLAST WCH-CD
  Length = 43

 Score = 121 (56.1 bits), Expect = 9.2e-11, P = 9.2e-11
 Identities = 24/43 (55%), Positives = 31/43 (72%)

Query:    70 ISGAHSNPAVTLGLLLSCQISILRAVMYIIAQCVGAIVASAIL 112
             ISG H NPAVT+GLL+  +   LRAV YI AQ +GA+  +A+L
Sbjct:     1 ISGGHINPAVTIGLLIGGRFPFLRAVFYIAAQLLGAVAGAALL 43

>45607 p34.2 (1) PMIP_NICAL // POLLEN-SPECIFIC MEMBRANE INTEGRAL PROTEIN.
  Length = 69

 Score = 80 (37.1 bits), Expect = 1.2e-07, Sum P(2) = 1.2e-07
 Identities = 17/54 (31%), Positives = 32/54 (59%)

Query:   149 LVLCVLATTDRRRRDLGGSAPLAIGLSVALGHLLAIDYTGCGINPARSFGSAVL 202
             L++ V++      R +G  A +A+G+++ L   +A   +G  +NPARS G A++
Sbjct:    13 LLMFVISGVATDDRAIGQVAGIAVGMTITLNVFVAGPISGASMNPARSIGPAIV 66

 Score = 34 (15.8 bits), Expect = 1.2e-07, Sum P(2) = 1.2e-07
 Identities = 8/18 (44%), Positives = 11/18 (61%)

Query:   136 GQGLGIEIIGTLQLVLCV 153
             GQ L IEII +  L+  +
Sbjct:     1 GQSLAIEIIISFLLMFVI 18

>45606 p34.2 (1) BIB_DROME // NEUROGENIC PROTEIN BIG BRAIN.
  Length = 119

 Score = 80 (37.1 bits), Expect = 1.2e-05, Sum P(2) = 1.2e-05
 Identities = 15/34 (44%), Positives = 24/34 (70%)

Query:     1 MASEIKKKLFWRAVVAEFLAMTLFVFISIGSALG 34
             M +EI+   FWR++++E LA  ++VFI  G+A G
Sbjct:    55 MQAEIRTLEFWRSIISECLASFMYVFIVCGAAAG 88

 Score = 39 (18.1 bits), Expect = 1.2e-05, Sum P(2) = 1.2e-05
 Identities = 9/17 (52%), Positives = 12/17 (70%)

Query:    53 SLAFGLSIATLAQSVGH 69
             +LA GL++ATL Q   H
Sbjct:   103 ALASGLAMATLTQCFLH 119

>2027 p34.2 (15) GLPF(9) AQP3(2)  // PROTEIN FACILITATOR GLYCEROL UPTAKE
  AQUAPORIN DIFFUSION UPTAKE/EFFLUX PEPX 5'REGION ORF1
  Length = 55

 Score = 60 (27.8 bits), Expect = 3.4e-05, Sum P(2) = 3.4e-05
 Identities = 17/46 (36%), Positives = 20/46 (43%)

Query:   156 TTDRRRRDLGGSAPLAIGLSVALGHLLAIDYTGCGINPARSFGSAV 201
             T D      GG  PL +G  V    +     TG  INPAR FG  +
Sbjct:    10 TDDGNNVPSGGLHPLMVGFLVMGIGMSLGGTTGYAINPARDFGPRI 55

 Score = 37 (17.2 bits), Expect = 3.4e-05, Sum P(2) = 3.4e-05
 Identities = 7/10 (70%), Positives = 8/10 (80%)

Query:   149 LVLCVLATTD 158
             L+ CVLA TD
Sbjct:     2 LIACVLALTD 11

>45615 p34.2 (1) GLPF_STRPN // GLYCEROL UPTAKE FACILITATOR PROTEIN.
  Length = 26

 Score = 63 (29.2 bits), Expect = 0.025, P = 0.024
 Identities = 13/23 (56%), Positives = 18/23 (78%)

Query:   205 NFSNHWIFWVGPFIGSALAVLIY 227
             ++S  WI  VGP IG+ALAVL++
Sbjct:     1 DWSYAWIPVVGPVIGAALAVLVF 23

>45638 p34.2 (1) AQP5_HUMAN // AQUAPORIN 5.
  Length = 27

 Score = 61 (28.3 bits), Expect = 0.045, P = 0.044
 Identities = 11/19 (57%), Positives = 18/19 (94%)

Query:    50 VKVSLAFGLSIATLAQSVG 68
             ++++LAFGL+I TLAQ++G
Sbjct:     8 LQIALAFGLAIGTLAQALG 26

Parameters:
  E=0.1
  B=500

  V=500
  -ctxfactor=1.00

  Query                        -----  As Used  -----    -----  Computed  ----
  Frame  MatID Matrix name     Lambda    K       H      Lambda    K       H
   +0      0   BLOSUM62        0.322   0.138   0.394    same    same    same

  Query
  Frame  MatID  Length  Eff.Length   E    S W   T  X     E2  S2
   +0      0      269       269     0.10 69 3  11 22    0.22 33

Statistics:
  Query          Expected         Observed           HSPs       HSPs
  Frame  MatID  High Score       High Score       Reportable  Reported
   +0      0    59 (27.4 bits)  270 (125.3 bits)       14         14

  Query         Neighborhd  Word      Excluded    Failed   Successful  Overlaps
  Frame  MatID   Words      Hits        Hits    Extensions Extensions  Excluded
   +0      0      5349     3124825      609708     2510548     4569         2

  Database:  /home/phd/ut/prodom/prodom_34_2
    Release date:  unknown
    Posted date:  12:24 PM MET DST May 06, 1998
  # of letters in database:  6,740,067
  # of sequences in database:  53,597
  # of database sequences satisfying E:  9
  No. of states in DFA:  564 (111 KB)
  Total size of DFA:  226 KB (256 KB)
  Time to generate neighborhood:  0.03u 0.00s 0.03t  Real: 00:00:00
  Time to search database:  9.80u 0.03s 9.83t  Real: 00:00:10
  Total cpu time:  9.90u 0.06s 9.96t  Real: 00:00:10
--- END of BLASTP output
--- ------------------------------------------------------------
---
--- Again: these results were obtained based on the domain data-
--- base collected by Daniel Kahn and his coworkers in Toulouse.
---
--- PLEASE quote:
---       F Corpet, J Gouzy, D Kahn (1998).  The ProDom database
---       of protein domain families. Nucleic Ac Res 26:323-326.
---
--- The general WWW page is on:
----      ---------------------------------------
---       http://www.toulouse.inra.fr/prodom.html
----      ---------------------------------------
---
--- For WWW graphic interfaces to PRODOM, in particular for your
--- protein family, follow the following links (each line is ONE
--- single link for your protein!!):
---
http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom1=390 ==> multiple alignment, consensus, PDB and PROSITE links of domain 390
http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom2=390 ==> graphical output of all proteins having domain 390
http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom1=45663 ==> multiple alignment, consensus, PDB and PROSITE links of domain 45663
http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom2=45663 ==> graphical output of all proteins having domain 45663
http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom1=45611 ==> multiple alignment, consensus, PDB and PROSITE links of domain 45611
http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom2=45611 ==> graphical output of all proteins having domain 45611
http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom1=304 ==> multiple alignment, consensus, PDB and PROSITE links of domain 304
http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom2=304 ==> graphical output of all proteins having domain 304
http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom1=45607 ==> multiple alignment, consensus, PDB and PROSITE links of domain 45607
http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom2=45607 ==> graphical output of all proteins having domain 45607
http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom1=45606 ==> multiple alignment, consensus, PDB and PROSITE links of domain 45606
http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom2=45606 ==> graphical output of all proteins having domain 45606
http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom1=2027 ==> multiple alignment, consensus, PDB and PROSITE links of domain 2027
http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom2=2027 ==> graphical output of all proteins having domain 2027
http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom1=45615 ==> multiple alignment, consensus, PDB and PROSITE links of domain 45615
http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom2=45615 ==> graphical output of all proteins having domain 45615
http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom1=45638 ==> multiple alignment, consensus, PDB and PROSITE links of domain 45638
http://www.toulouse.inra.fr/prodom/cgi-bin/ReqProdomII.pl?id_dom2=45638 ==> graphical output of all proteins having domain 45638
---
--- NOTE: if you want to use the link, make sure the entire line
---       is pasted as URL into your browser!
---
--- END of PRODOM
--- ------------------------------------------------------------

________________________________________________________________________________

--- Database used for sequence comparison:
--- SEQBASE    RELEASE 34.0 OF EMBL/SWISS-PROT WITH  59021 SEQUENCES

The alignment that has been used as input to the network is:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

________________________________________________________________________________

--- ------------------------------------------------------------
--- MAXHOM multiple sequence alignment
--- ------------------------------------------------------------
---
--- MAXHOM ALIGNMENT HEADER: ABBREVIATIONS FOR SUMMARY
--- ID           : identifier of aligned (homologous) protein
--- STRID        : PDB identifier (only for known structures)
--- PIDE         : percentage of pairwise sequence identity
--- WSIM         : percentage of weighted similarity
--- LALI         : number of residues aligned
--- NGAP         : number of insertions and deletions (indels)
--- LGAP         : number of residues in all indels
--- LSEQ2        : length of aligned sequence
--- ACCNUM       : SwissProt accession number
--- NAME         : one-line description of aligned protein
---
--- MAXHOM ALIGNMENT HEADER: SUMMARY
ID         STRID  IDE WSIM LALI NGAP LGAP LEN2 ACCNUM NAME
aqp1_rat          100  100  269    0    0  269 P29975 PROXIMAL TUBULE) (AQUAPOR
aqp1_mouse         98   99  269    0    0  269 Q02013 PROXIMAL TUBULE) (AQUAPOR
aqp1_human         93   97  269    0    0  269 P29972 PROXIMAL TUBULE) (AQUAPOR
aqp1_bovin         90   95  269    1    2  271 P47865 PROXIMAL TUBULE) (AQUAPOR
aqp1_sheep         90   94  269    2    3  272 P56401 PROXIMAL TUBULE) (AQUAPOR
aqpa_ranes         78   89  268    2    5  272 P50501 AQUAPORIN FA-CHIP.
aqp2_dasno         49   73  109    1    7  109 P79164 PROTEIN) (WCH-CD) (FRAGME
aqp2_bovin         49   73  109    1    7  109 P79099 PROTEIN) (WCH-CD) (FRAGME
aqp2_canfa         48   72  109    1    7  109 P79144 PROTEIN) (WCH-CD) (FRAGME
aqp2_rabit         48   73  109    1    7  109 P79213 PROTEIN) (WCH-CD) (FRAGME
aqp2_elema         47   72  109    1    7  109 P79168 PROTEIN) (WCH-CD) (FRAGME
aqp2_horse         47   72  109    1    7  109 P79165 PROTEIN) (WCH-CD) (FRAGME
aqp2_proha         47   73  109    1    7  109 P79229 PROTEIN) (WCH-CD) (FRAGME
mip_rat            46   73  259    1    7  261 P09011 LENS FIBER MAJOR INTRINSI
aqp2_oryaf         46   72  109    1    7  109 P79200 PROTEIN) (WCH-CD) (FRAGME
mip_mouse          46   73  261    1    7  263 P51180 LENS FIBER MAJOR INTRINSI
mip_ranpi          45   73  261    1    7  263 Q06019 LENS FIBER MAJOR INTRINSI
mip_bovin          45   73  261    1    7  263 P06624 LENS FIBER MAJOR INTRINSI
mip_human          45   73  261    1    7  263 P30301 LENS FIBER MAJOR INTRINSI
mip_chick          45   72  110    1    1  112 P28238 LENS FIBER MAJOR INTRINSI
aqp5_rat           44   71  262    2    8  265 P47864 AQUAPORIN 5.
aqp5_human         44   71  262    2    8  265 P55064 AQUAPORIN 5.
aqp2_human         44   72  261    2    8  271 P41181 PROTEIN) (WCH-CD).
aqp4_human         43   70  266    2    5  323 P55087 AQUAPORIN 4 (WCH4) (MERCU
aqp4_rat           43   70  266    2    5  323 P47863 AQUAPORIN 4 (WCH4) (MERCU
aqp4_mouse         43   69  265    3    6  322 P55088 AQUAPORIN 4 (WCH4) (MERCU
aqp2_rat           42   71  261    2    8  271 P34080 PROTEIN) (WCH-CD).
aqp2_mouse         42   71  261    2    8  271 P56402 PROTEIN) (WCH-CD).
wc2a_arath         42   67  248    4   12  287 P43286 PLASMA MEMBRANE INTRINSIC
aqp6_human         42   68  260    2    9  282 Q13520 AQUAPORIN 6 (AQUAPORIN-2
wc2c_arath         41   66  248    4   12  285 P30302 INTRINSIC PROTEIN) (WSI-T
wc2b_arath         41   66  248    4   12  285 P43287 PLASMA MEMBRANE INTRINSIC
wc1c_arath         41   65  238    4   10  286 Q08733 (TMP-B).
wc1b_arath         41   65  238    4   10  286 Q06611 (TMP-A).
tipw_lyces         40   65  237    4   10  286 Q08451 (RIPENING-ASSOCIATED MEMB
wc1a_arath         40   64  238    4   10  286 P43285 PLASMA MEMBRANE INTRINSIC
tipw_pea           40   64  237    4   11  289 P25794 RESPONSIVE PROTEIN 7A).
tipa_arath         38   64  250    3    9  268 P26587 TONOPLAST INTRINSIC PROTE
aqua_atrca         38   64  246    4   10  282 P42767 AQUAPORIN.
dip_antma          38   65  242    2    4  250 P33560 PROBABLE TONOPLAST INTRIN
aqpz_ecoli         37   59  220    4   17  231 P48838 AQUAPORIN Z (BACTERIAL NO
tip2_tobac         37   64  242    2    4  250 P24422 TONOPLAST INTRINSIC PROTE
tip1_tobac         37   64  242    2    4  250 P21653 TONOPLAST INTRINSIC PROTE
tipg_arath         33   62  241    2    4  251 P25818 TONOPLAST INTRINSIC PROTE
bib_drome          33   60  260    4   10  700 P23645 NEUROGENIC PROTEIN BIG BR
tipr_arath         33   62  243    2    4  253 P21652 TONOPLAST INTRINSIC PROTE
tipa_phavu         33   62  246    2    4  256 P23958 TONOPLAST INTRINSIC PROTE
tipg_orysa         32   62  240    2    5  250 P50156 TONOPLAST INTRINSIC PROTE
---
--- MAXHOM ALIGNMENT: IN MSF FORMAT
MSF of: /home/phd/server/work/predict_h25873-22040.hssp from:    1 to:  269
 /home/phd/server/work/predict_h25873-22040.msfRet  MSF:  269  Type: P 24-Nov-98  17:44:5  Check: 3448  ..

 Name: predict_h258    Len:   269  Check: 8331  Weight:  1.00
 Name: aqp1_rat        Len:   269  Check: 8331  Weight:  1.00
 Name: aqp1_mouse      Len:   269  Check: 7552  Weight:  1.00
 Name: aqp1_human      Len:   269  Check: 6501  Weight:  1.00
 Name: aqp1_bovin      Len:   269  Check: 7067  Weight:  1.00
 Name: aqp1_sheep      Len:   269  Check: 7582  Weight:  1.00
 Name: aqpa_ranes      Len:   269  Check: 4844  Weight:  1.00
 Name: aqp2_dasno      Len:   269  Check: 8933  Weight:  1.00
 Name: aqp2_bovin      Len:   269  Check: 9649  Weight:  1.00
 Name: aqp2_canfa      Len:   269  Check: 8990  Weight:  1.00
 Name: aqp2_rabit      Len:   269  Check: 8787  Weight:  1.00
 Name: aqp2_elema      Len:   269  Check: 9381  Weight:  1.00
 Name: aqp2_horse      Len:   269  Check: 8993  Weight:  1.00
 Name: aqp2_proha      Len:   269  Check: 8855  Weight:  1.00
 Name: mip_rat         Len:   269  Check: 9773  Weight:  1.00
 Name: aqp2_oryaf      Len:   269  Check: 8554  Weight:  1.00
 Name: mip_mouse       Len:   269  Check: 9723  Weight:  1.00
 Name: mip_ranpi       Len:   269  Check: 5937  Weight:  1.00
 Name: mip_bovin       Len:   269  Check: 1430  Weight:  1.00
 Name: mip_human       Len:   269  Check:  372  Weight:  1.00
 Name: mip_chick       Len:   269  Check: 4658  Weight:  1.00
 Name: aqp5_rat        Len:   269  Check: 9033  Weight:  1.00
 Name: aqp5_human      Len:   269  Check: 6547  Weight:  1.00
 Name: aqp2_human      Len:   269  Check: 6209  Weight:  1.00
 Name: aqp4_human      Len:   269  Check: 2589  Weight:  1.00
 Name: aqp4_rat        Len:   269  Check: 4412  Weight:  1.00
 Name: aqp4_mouse      Len:   269  Check: 2845  Weight:  1.00
 Name: aqp2_rat        Len:   269  Check: 5748  Weight:  1.00
 Name: aqp2_mouse      Len:   269  Check: 6526  Weight:  1.00
 Name: wc2a_arath      Len:   269  Check: 4866  Weight:  1.00
 Name: aqp6_human      Len:   269  Check: 9404  Weight:  1.00
 Name: wc2c_arath      Len:   269  Check: 6187  Weight:  1.00
 Name: wc2b_arath      Len:   269  Check: 7328  Weight:  1.00
 Name: wc1c_arath      Len:   269  Check: 8575  Weight:  1.00
 Name: wc1b_arath      Len:   269  Check: 9544  Weight:  1.00
 Name: tipw_lyces      Len:   269  Check: 9283  Weight:  1.00
 Name: wc1a_arath      Len:   269  Check:  598  Weight:  1.00
 Name: tipw_pea        Len:   269  Check: 9253  Weight:  1.00
 Name: tipa_arath      Len:   269  Check: 6544  Weight:  1.00
 Name: aqua_atrca      Len:   269  Check: 2848  Weight:  1.00
 Name: dip_antma       Len:   269  Check: 9619  Weight:  1.00
 Name: aqpz_ecoli      Len:   269  Check: 5641  Weight:  1.00
 Name: tip2_tobac      Len:   269  Check:  490  Weight:  1.00
 Name: tip1_tobac      Len:   269  Check:  622  Weight:  1.00
 Name: tipg_arath      Len:   269  Check: 3231  Weight:  1.00
 Name: bib_drome       Len:   269  Check: 7687  Weight:  1.00
 Name: tipr_arath      Len:   269  Check: 4476  Weight:  1.00
 Name: tipa_phavu      Len:   269  Check: 5563  Weight:  1.00
 Name: tipg_orysa      Len:   269  Check: 3537  Weight:  1.00

//

              1                                                   50
predict_h258  MASEIKKKLF WRAVVAEFLA MTLFVFISIG SALGFNYPLE RNQTLVQDNV
aqp1_rat      MASEIKKKLF WRAVVAEFLA MTLFVFISIG SALGFNYPLE RNQTLVQDNV
aqp1_mouse    MASEIKKKLF WRAVVAEFLA MTLFVFISIG SALGFNYPLE RNQTLVQDNV
aqp1_human    MASEFKKKLF WRAVVAEFLA TTLFVFISIG SALGFKYPVG NNQTAVQDNV
aqp1_bovin    MASEFKKKLF WRAVVAEFLA MILFIFISIG SALGFHYPIK SNQTtvQDNV
aqp1_sheep    MASEFKKKLF WRAVVAEFLA MILFIFISIG SALGFHYPIK SNQTtvQDNV
aqpa_ranes    MASEFKKKAF WRAVIAEFLA MILFVFISIG AALGFNFPIE EKANQtqDIV
aqp2_dasno    ......SVAF SRAVLAEFLA TLIFVFFGLG SALSWPQALP S.......VL
aqp2_bovin    ......SIAF SRAVLAEFLA TLLFVFFGLG SALNWPQALP S.......VL
aqp2_canfa    ......SVAF SRAVFAEFLA TLLFVFFGLG SALNWPQALP S.......VL
aqp2_rabit    ......SIAF SRAVFAEFLA TLLFVFFGLG SALNWPSALP S.......TL
aqp2_elema    ......SIAF SRAVFSEFLA TLLFVFFGLG SALNWPQALP S.......VL
aqp2_horse    ......SIAF SRAVLAEFLA TLLFVFFGLG SALNWPQAMP S.......VL
aqp2_proha    ......SIAF SRAVLSEFLA TLLFVFFGLG SALNWPQALP S.......VL
mip_rat       ...ELRSASF WRAIFAEFFA TLFYVFFGLG SSLRWA.... ...PGPLHVL
aqp2_oryaf    ......SIAF SKAVFSEFLA TLLFVFFGLG SALNWPQALP S.......GL
mip_mouse     .MWELRSASF WRAIFAEFFA TLFYVFFGLG ASLRWA.... ...PGPLHVL
mip_ranpi     .MWEFRSFSF WRAVFAEFFG TMFYVFFGLG ASLKWAAGPA .......NVL
mip_bovin     .MWELRSASF WRAICAEFFA SLFYVFFGLG ASLRWA.... ...PGPLHVL
mip_human     .MWELRSASF WRAIFAEFFA TLFYVFFGLG SSLRWA.... ...PGPLHVL
mip_chick     .......... .......... .......... .......... ..........
aqp5_rat      MKKEVCSLAF FKAVFAEFLA TLIFVFFGLG SALKWPSALP T.......IL
aqp5_human    MKKEVCSVAF LKAVFAEFLA TLIFVFFGLG SALKWPSALP T.......IL
aqp2_human    .MWELRSIAF SRAVFAEFLA TLLFVFFGLG SALNWPQALP S.......VL
aqp4_human    AFKGVWTQAF WKAVTAEFLA MLIFVLLSLG STINWG...G TEKPLPVDMV
aqp4_rat      AFKGVWTQAF WKAVTAEFLA MLIFVLLSVG STINWG...G SENPLPVDMV
aqp4_mouse    AFKGVWTQAF WKAVSAEFLA TLIFVL.GVG STINWG...G SENPLPVDMV
aqp2_rat      .MWELRSIAF SRAVLAEFLA TLLFVFFGLG SALQWASSPP S.......VL
aqp2_mouse    .MWELRSIAY CRAVLAEFLA TLLFVFFGLG SALQWASSPP S.......VL
wc2a_arath    DGAELKKWSF YRAVIAEFVA TLLFLYITVL TVIGYKIQSD TDAGGVdgIL
aqp6_human    MLACRLWKAI SRALFAEFLA TGLYVFFGVG SVMRWPTALP S.......VL
wc2c_arath    DAEELTKWSL YRAVIAEFVA TLLFLYVTVL TVIGYKIQSD TKAGGVdgIL
wc2b_arath    DADELTKWSL YRAVIAEFVA TLLFLYITVL TVIGYKIQSD TKAGGVdgIL
wc1c_arath    EPGELSSWSF YRAGIAEFIA TFLFLYITVL TVMGVKRA.. PNMCASVGIQ
wc1b_arath    EPGELASWSF WRAGIAEFIA TFLFLYITVL TVMGVKR..S PNMCASVGIQ
tipw_lyces    EPGELSSWSF YRAGIAEFMA TFLFLYITIL TVMGLKRSDS LCSSV..GIQ
wc1a_arath    EPGELSSWSF WRAGIAEFIA TFLFLYITVL TVMGVKR..S PNMCASVGIQ
tipw_pea      EPSELTSWSF YRAGIAEFIA TFLFLYITVL TVMGVVRESS KCKTV..GIQ
tipa_arath    RADEATHPDS IRATLAEFLS TFVFVFAAEG SILSLDKLYW EHAAHAGTni
aqua_atrca    DMGELKLWSF WRAAIAEFIA TLLFLYITVA TVIGYKKETD PCASVGL..L
dip_antma     SIGDSFSVAS IKAYVAEFIA TLLFVFAGVG SAIAYNKLTS DAALDPAGLV
aqpz_ecoli    .........M FRKLAAECFG TFWLVFGGCG SAVLAAGFPE ....LGIGFA
tip2_tobac    SIGDSFSVGS LKAYVAEFIA TLLFVFAGVG SAIAYNKLTA DAALDPAGLV
tip1_tobac    SIGDSFSVGS LKAYVAEFIA TLLFVFAGVG SAIAYNKLTA DAALDPAGLV
tipg_arath    RPDEATRPDA LKAALAEFIS TLIFVVAGSG SGMAFNKLTE NGATTPSGLV
bib_drome     MQAEIRTLEF WRSIISECLA SFMYVFIVCG AAAGVGVGAS VSSVL....L
tipr_arath    RPDEATRPDA LKAALAEFIS TLIFVVAGSG SGMAFNKLTE NGATTPSGLV
tipa_phavu    RTDEATHPDS MRASLAEFAS TFIFVFAGEG SGLALVKIYQ DSAFSAGELL
tipg_orysa    SHQEVYHPGA LKAALAEFIS TLIFVFAGQG SGMAFSKLTG GGATTPAGLI

              51                                                 100
predict_h258  KVSLAFGLSI ATLAQSVGHI SGAHSNPAVT LGLLLSCQIS ILRAVMYIIA
aqp1_rat      KVSLAFGLSI ATLAQSVGHI SGAHSNPAVT LGLLLSCQIS ILRAVMYIIA
aqp1_mouse    KVSLAFGLSI ATLAQSVGHI SGAHLNPAVT LGLLLSCQIS ILRAVMYIIA
aqp1_human    KVSLAFGLSI ATLAQSVGHI SGAHLNPAVT LGLLLSCQIS IFRALMYIIA
aqp1_bovin    KVSLAFGLSI ATLAQSVGHI SGAHLNPAVT LGLLLSCQIS VLRAIMYIIA
aqp1_sheep    KVSLAFGLSI ATLAQSVGHI SGAHLNPAVT LGLLLSCQIS ILRAIMYIIA
aqpa_ranes    KVSLAFGISI ATMAQSVGHV SGAHLNPAVT LGCLLSCQIS ILKAVMYIIA
aqp2_dasno    QIALAFGLAI GTLVQALGHV SGAHINPAVT VACLVGCHVS FLRAAFYVAA
aqp2_bovin    QIAMAFGLAI GTLVQALGHV SGAHINPAVT VACLVGCHVS FLRAVFYVAA
aqp2_canfa    QIAMAFGLGI GTLVQALGHV SGAHINPAVT VACLVGCHVS FLRAAFYVAA
aqp2_rabit    QIAMAFGLGI GTLVQALGHV SGAHINPAVT VACLVGCHVS FLRAAFYVAA
aqp2_elema    QIAMAFGLAI GTLVQTLGHI SGAHINPAVT VACLVGCHVS FLRATFYLAA
aqp2_horse    QIAMAFGLAI GTLVQALGHV SGAHINPAVT VACLVGCHVS FLRAAFYVAA
aqp2_proha    QIAMAFGLAI GTLVQTLGHI SGAHINPAVT IACLVGCHVS FLRALFYLAA
mip_rat       QVALAFGLAL ATLVQTVGHI SGAHVNPAVT FAFLVGSQMS LLRAFCYIAA
aqp2_oryaf    QIAMAFGLAI GTLVQTLGHI SGAHINPAVT VACLVGCHVS FLRAIFYVAA
mip_mouse     QVALAFGLAL ATLVQTVGHI SGAHVNPAVT FAFLVGSQMS LLRAFCYIAA
mip_ranpi     VIALAFGLVL ATMVQSIGHV SGAHINPAVT FAFLIGSQMS LFRAIFYIAA
mip_bovin     QVALAFGLAL ATLVQAVGHI SGAHVNPAVT FAFLVGSQMS LLRAICYMVA
mip_human     QVAMAFGLAL ATLVQSVGHI SGAHVNPAVT FAFLVGSQMS LLRAFCYMAA
mip_chick     .......... .......... .......... .......... ..........
aqp5_rat      QISIAFGLAI GTLAQALGPV SGGHINPAIT LALLIGNQIS LLRAVFYVAA
aqp5_human    QIALAFGLAI GTLAQALGPV SGGHINPAIT LALLVGNQIS LLRAFFYVAA
aqp2_human    QIAMAFGLGI GTLVQALGHI SGAHINPAVT VACLVGCHVS VLRAAFYVAA
aqp4_human    LISLCFGLSI ATMVQCFGHI SGGHINPAVT VAMVCTRKIS IAKSVFYIAA
aqp4_rat      LISLCFGLSI ATMVQCFGHI SGGHINPAVT VAMVCTRKIS IAKSVFYITA
aqp4_mouse    LISLCFGLSI ATMVQCLGHI SGGHINPAVT VAMVCTRKIS IAKSVFYIIA
aqp2_rat      QIAVAFGLGI GILVQALGHV SGAHINPAVT VACLVGCHVS FLRAAFYVAA
aqp2_mouse    QIAVAFGLGI GTLVQALGHV SGAHINPAVT VACLVGCHVS FLRAAFYVAA
wc2a_arath    GIAWAFGGMI FILVYCTAGI SGGHINPAVT FGLFLARKVS LPRALLYIIA
aqp6_human    QIAITFNLVT AMAVQVTWKT SGAHANPAVT LAFLVGSHIS LPRAVAYVAA
wc2c_arath    GIAWAFGGMI FILVYCTAGI SGGHINPAVT FGLFLARKVS LIRAVLYMVA
wc2b_arath    GIAWAFGGMI FILVYCTAGI SGGHINPAVT FGLFLARKVS LIRAVLYMVA
wc1c_arath    GIAWAFGGMI FALVYCTAGI SGGHINPAVT FGLFLARKLS LTRAVFYIVM
wc1b_arath    GIAWAFGGMI FALVYCTAGI SGGHINPAVT FGLFLARKLS LTRAVYYIVM
tipw_lyces    GVAWAFGGMI FALVYCTAGI SGGHINPAVT FGLFLARKLS LTRAVFYMVM
wc1a_arath    GIAWAFGGMI FALVYCTAGI SGGHINPAVT FGLFLARKLS LTRALYYIVM
tipw_pea      GIAWAFGGMI FALVYCTAGI SGGHINPAVT FGLFLARKLS LTRAIFYMVM
tipa_arath    LVALAHAFAL FAAVSAAINV SGGHVNPAVT FGALVGGRVT AIRAIYYWIA
aqua_atrca    GIAWSFGGMI FVLVYCTAGI SGGHINPAVT FGLFLARKVS LLRALVYMIA
dip_antma     AVAVAHAFAL FVGVSMAANV SGGHLNPAVT LGLAVGGNIT ILTGLFYWIA
aqpz_ecoli    GVALAFGLTV LTMAFAVGHI SGGHFNPAVT IGLWAGGRFP AKEVVGYVIA
tip2_tobac    AVAVAHAFAL FVGVSIAANI SGGHLNPAVT LGLAVGGNIT ILTGFFYWIA
tip1_tobac    AVAVAHAFAL FVGVSIAANI SGGHLNPAVT LGLAVGGNIT ILTGFFYWIA
tipg_arath    AAAVAHAFGL FVAVSVGANI SGGHVNPAVT FGAFIGGNIT LLRGILYWIA
bib_drome     ATALASGLAM ATLTQCFLHI SGAHINPAVT LALCVVRSIS PIRAAMYITA
tipr_arath    AAAVAHAFGL FVAVSVGANI SGGHVNPAVT FGAFIGGNIT LLRGILYWIA
tipa_phavu    ALALAHAFAL FAAVSASMHV SGGHVNPAVS FGALIGGRIS VIRAVYYWIA
tipg_orysa    AAAVAHAFAL FVAVSVGANI SGGHVNPAVT FGAFVGGNIT LFRGLLYWIA

              101                                                150
predict_h258  QCVGAIVASA ILSGITSSLL ENSLGRNDLA RGVNSGQGLG IEIIGTLQLV
aqp1_rat      QCVGAIVASA ILSGITSSLL ENSLGRNDLA RGVNSGQGLG IEIIGTLQLV
aqp1_mouse    QCVGAIVATA ILSGITSSLV DNSLGRNDLA HGVNSGQGLG IEIIGTLQLV
aqp1_human    QCVGAIVATA ILSGITSSLT GNSLGRNDLA DGVNSGQGLG IEIIGTLQLV
aqp1_bovin    QCVGAIVATA ILSGITSSLP DNSLGLNALA PGVNSGQGLG IEIIGTLQLV
aqp1_sheep    QCVGAIVATV ILSGITSSLP DNSLGLNALA PGVNSGQGLG IEIIGTLQLV
aqpa_ranes    QCLGAVVATA ILSGITSGLE NNSLGLNGLS PGVSAGQGLG VEILVTFQLV
aqp2_dasno    QLLGAVAGAA ILHEITPPDV RG........ .......... ..........
aqp2_bovin    QLLGAVAGAA LLHEITPPAI RG........ .......... ..........
aqp2_canfa    QLLGAVAGAA LLHEITPPHV RG........ .......... ..........
aqp2_rabit    QLLGAVAGAA LLHEITPAEV RG........ .......... ..........
aqp2_elema    QLLGAVAGAA LLHELTPPDI RG........ .......... ..........
aqp2_horse    QLLGAVAGAA LLHEITPPDI RR........ .......... ..........
aqp2_proha    QLLGAVAGAA LLHELTPPDI RG........ .......... ..........
mip_rat       QLLGAVAGAA VLYSVTPPAV RGNLALNTLH AGVSVGQATT VEIFLTLQFV
aqp2_oryaf    QLLGAVAGAA LLHELTPPDI RG........ .......... ..........
mip_mouse     QLLGAVAGAA VLYSVTPPAV RGNLALNTLH TGVSVGQATT VEIFLTLQFV
mip_ranpi     QLLGAVAGAA VLYGVTPAAI RGNLALNTLH PGVSLGQATT VEIFLTLQFV
mip_bovin     QLLGAVAGAA VLYSVTPPAV RGNLALNTLH PGVSVGQATI VEIFLTLQFV
mip_human     QLLGAVAGAA VLYSVTPPAV RGNLALNTLH PAVSVGQATT VEIFLTLQFV
mip_chick     .......... .......... .......... .......... ..........
aqp5_rat      QLVGAIAGAG ILYWLAPLNA RGNLAVNALN NNTTPGKAMV VELILTFQLA
aqp5_human    QLVGAIAGAG ILYGVAPLNA RGNLAVNALN NNTTQGQAMV VELILTFQLA
aqp2_human    QLLGAVAGAA LLHEITPADI RGDLAVNALS NSTTAGQAVT VELFLTLQLV
aqp4_human    QCLGAIIGAG ILYLVTPPSV VGGLGVTMVH GNLTAGHGLL VELIITFQLV
aqp4_rat      QCLGAIIGAG ILYLVTPPSV VGGLGVTTVH GNLTAGHGLL VELIITFQLV
aqp4_mouse    QCLGAIIGAG ILYLVTPPSV VGGLGVTTVH GNLTAGHGLL VELIITFQLV
aqp2_rat      QLLGAVAGAA ILHEITPVEI RGDLAVNALH NNATAGQAVT VELFLTMQLV
aqp2_mouse    QLLGAVAGAA ILHEITPVEI RGDLAVNALH NNATAGQAVT VELFLTMQLV
wc2a_arath    QCLGAICGVG FVKAFQSSYY TRYGGgnSLA DGYSTGTGLA AEIIGTFVLV
aqp6_human    QLVGATVGAA LLYGVMPGDI RETLGINVVR NSVSTGQAVA VELLLTLQLV
wc2c_arath    QCLGAICGVG FVKAFQSSHY VNYGGgnFLA DGYNTGTGLA AEIIGTFVLV
wc2b_arath    QCLGAICGVG FRQSFQSSYY DRYGGgnSLA DGYNTGTGLA AEIIGTFVLV
wc1c_arath    QCLGAICGAG VVKGFQPNPY QtgGGANTVA HGYTKGSGLG AEIIGTFVLV
wc1b_arath    QCLGAICGAG VVKGFQPKQY QagGGANTIA HGYTKGSGLG AEIIGTFVLV
tipw_lyces    QCLGAICGAG VVKGFMVGPY QrgGGANVVN PGYTKGDGLG AEIIGTFVLV
wc1a_arath    QCLGAICGAG VVKGFQPKQY QagGGANTVA HGYTKGSGLG AEIIGTFVLV
tipw_pea      QVLGAICGAG VVKGFEGKQR FGDLNgnFVA PGYTKGDGLG AEIVGTFILV
tipa_arath    QLLGAILACL LLRLTTNGMR PVGFR...LA SGVGAVNGLV LEIILTFGLV
aqua_atrca    QCAGAICGVG LVKAFMKGPY NqgGGANSVA LGYNKGTAFG AELIGTFVLV
dip_antma     QCLGSTVACL LLKFVTNGL. ..SVPTHGVA AGMDAIQGVV MEIIITFALV
aqpz_ecoli    QVVGGIVAAA LLYLIASGKT GFDAAASGFA sgYSMLSALV VELVLSAGFL
tip2_tobac    QLLGSTVACL LLKYVTNGL. ..AVPTHGVA AGLNGFQGVV MEIIITFALV
tip1_tobac    QLLGSTVACL LLKYVTNGL. ..AVPTHGVA AGLNGLQGVV MEIIITFALV
tipg_arath    QLLGSVVACL ILKFATGGLA VPAFG...LS AGVGVLNAFV FEIVMTFGLV
bib_drome     QCGGGIAGAA LLYGVTVPGY QGNLQAasHS AALAAWERFG VEFILTSLVV
tipr_arath    QLLGSVVACL ILKFATGGLA VPPFG...LS AGVGVLNAFV FEIVMTFGLV
tipa_phavu    QLLGSIVAAL VLRLVTNNMR PSGF...HVS PGVGVGHMFI LEVVMTFGLM
tipg_orysa    QLLGSTVACF LLRFSTGGLA TGTFGL.... TGVSVWEALV LEIVMTFGLV

              151                                                200
predict_h258  LCVLATTDRR RRDLGGSAPL AIGLSVALGH LLAIDYTGCG INPARSFGSA
aqp1_rat      LCVLATTDRR RRDLGGSAPL AIGLSVALGH LLAIDYTGCG INPARSFGSA
aqp1_mouse    LCVLATTDRR RRDLGGSAPL AIGLSVALGH LLAIDYTGCG INPARSFGSA
aqp1_human    LCVLATTDRR RRDLGGSAPL AIGLSVALGH LLAIDYTGCG INPARSFGSA
aqp1_bovin    LCVLATTDRR RRDLGGSGPL AIGFSVALGH LLAIDYTGCG INPARSFGSS
aqp1_sheep    LCVLATTDRR RrdLGDSGPL AIGFSVALGH LLAIDYTGCG INPARSFGSS
aqpa_ranes    LCVVAVTDRR RHDVSGSVPL AIGLSVALGH LIAIDYTGCG MNPARSFGSA
aqp2_dasno    .......... .......... .......... .......... ..........
aqp2_bovin    .......... .......... .......... .......... ..........
aqp2_canfa    .......... .......... .......... .......... ..........
aqp2_rabit    .......... .......... .......... .......... ..........
aqp2_elema    .......... .......... .......... .......... ..........
aqp2_horse    .......... .......... .......... .......... ..........
aqp2_proha    .......... .......... .......... .......... ..........
mip_rat       LCIFATYDER RNGRMGSVAL AVGFSLTLGH LFGMYYTGAG MNPARSFAPA
aqp2_oryaf    .......... .......... .......... .......... ..........
mip_mouse     LCIFATYDER RNGRMGSVAL AVGFSLTLGH LFGMYYTGAG MNPARSFAPA
mip_ranpi     LCIFATYDER RNGRLGSVSL AIGFSLTLGH LFGLYYTGAS MNPARSFAPA
mip_bovin     LCIFATYDER RNGRLGSVAL AVGFSLTLGH LFGMYYTGAG MNPARSFAPA
mip_human     LCIFATYDER RNGQLGSVAL AVGFSLALGH LFGMYYTGAG MNPARSFAPA
mip_chick     ........DR HDGRPGSAAL PVGFSLALGH LFGIPFTGAG MNPARSFAPA
aqp5_rat      LCIFSSTDSR RTSPVGSPAL SIGLSVTLGH LVGIYFTGCS MNPARSFGPA
aqp5_human    LCIFASTDSR RTSPVGSPAL SIGLSVTLGH LVGIYFTGCS MNPARSFGPA
aqp2_human    LCIFASTDER RGENPGTPAL SIGFSVALGH LLGIHYTGCS MNPARSLAPA
aqp4_human    FTIFASCDSK RTDVTGSIAL AIGFSVAIGH LFAINYTGAS MNPARSFGPA
aqp4_rat      FTIFASCDSK RTDVTGSVAL AIGFSVAIGH LFAINYTGAS MNPARSFGPA
aqp4_mouse    FTVFASCDSK RTDVTGSIAL AIGFSVAIGH LFAINYTGAS MNPARSFGPA
aqp2_rat      LCIFASTDER RGDNLGSPAL SIGFSVTLGH LLGIYFTGCS MNPARSLAPA
aqp2_mouse    LCIFASTDER RSDNLGSPAL SIGFSVTLGH LLGIYFTGCS MNPARSLAPA
wc2a_arath    YTVFSATDPK RSavPVLAPL PIGFAVFMVH LATIPITGTG INPARSFGAA
aqp6_human    LCVFASTDSR QTS..GSPAT MIGISWALGH LIGILFTGCS MNPARSFGPA
wc2c_arath    YTVFSATDPK RNavPVLAPL PIGFAVFMVH LATIPITGTG INPARSFGAA
wc2b_arath    YTVFSATDPK RNavPVLAPL PIGFAVFMVH LATIPITGTG INPARSFGAS
wc1c_arath    YTVFSATDAK RSavPILAPL PIGFAVFLVH LATIPITGTG INPARSLGAA
wc1b_arath    YTVFSATDAK RNavPILAPL PIGFAVFLVH LATIPITGTG INPARSLGAA
tipw_lyces    YTVFSATDAK RNavPILAPL PIGFAVFLVH LATIPITGTG INPARSLGAA
wc1a_arath    YTVFSATDAK RNavPILAPL PIGFAVFLVH LATIPITATG INPARSLGAA
tipw_pea      YTVFSATDAK RSavPILAPL PIGFAVFLVH LATIPITGTG INPARSLGAA
tipa_arath    YVVYStiDPK RGSLGIIAPL AIGLIVGANI LVGGPFSGAS MNPARAFGPA
aqua_atrca    YTVFSATDPK RSavPILAPL PIGFAVFMVH LATIPITGTG INPARSFGAA
dip_antma     YTVYAtaDPK KGSLGVIAPI AIGFIVGANI LAAGPFSGGS MNPARSFGPA
aqpz_ecoli    LVIHGATDKF APA..GFAPI AIGLALTLIH LISIPVTNTS VNPARSTAVA
tip2_tobac    YTVYAtaDPK KGSLGTIAPI AIGFIVGANI LAAGPFSGGS MNPARSFGPA
tip1_tobac    YTVYAtaDPK KGSLGTIAPI AIGFIVGANI LAAGPFSGGS MNPARSFGPA
tipg_arath    YTVYAtiDPK NGSLGTIAPI AIGFIVGANI LAGGAFSGAS MNPAVAFGPA
bib_drome     LCYFVSTDPM KKFMGNS.AA SIGCAYSACC FVSMPYLN.. ..PARSLGPS
tipr_arath    YTVYAtiDPK NGSLGTIAPI AIGFIVGANI LAGGAFSGAS MNPAVAFGPA
tipa_phavu    YTVYGtiDPK RGAVSYIAPL AIGLIVGANI LVGGPFDGAC MNPALAFGPS
tipg_orysa    YTVYAtvDPK KGSLGTIAPI AIGFIVGANI LVGGAFDGAS MNPAVSFGPA

              201                                                250
predict_h258  VLTRNFSNHW IFWVGPFIGS ALAVLIYDFI LAPRSSDFTD RMKVWTSGQV
aqp1_rat      VLTRNFSNHW IFWVGPFIGS ALAVLIYDFI LAPRSSDFTD RMKVWTSGQV
aqp1_mouse    VLTRNFSNHW IFWVGPFIGG ALAVLIYDFI LAPRSSDFTD RMKVWTSGQV
aqp1_human    VITHNFSNHW IFWVGPFIGG ALAVLIYDFI LAPRSSDLTD RVKVWTSGQV
aqp1_bovin    VITHNFQDHW IFWVGPFIGA ALAVLIYDFI LAPRSSDLTD RVKVWTSGQV
aqp1_sheep    VITHNFQDHW IFWVGPFIGA ALAVLIYDFI LAPRSSDLTD RVKVWTSGQV
aqpa_ranes    VLTKNFTYHW IFWVGPMIGG AAAAIIYDFI LAPRTSDLTD RMKVWTNGQV
aqp2_dasno    .......... .......... .......... .......... ..........
aqp2_bovin    .......... .......... .......... .......... ..........
aqp2_canfa    .......... .......... .......... .......... ..........
aqp2_rabit    .......... .......... .......... .......... ..........
aqp2_elema    .......... .......... .......... .......... ..........
aqp2_horse    .......... .......... .......... .......... ..........
aqp2_proha    .......... .......... .......... .......... ..........
mip_rat       ILTRNFSNHW VYWVGPIIGG GLGSLLYDFL LFPRLKSVSE RLSILKGARP
aqp2_oryaf    .......... .......... .......... .......... ..........
mip_mouse     ILTRNFSNHW VYWVGPIIGG GLGSLLYDFL LFPRLKSVSE RLSILKGARP
mip_ranpi     VLTRNFTNHW VYWVGPIIGG ALGGLVYDFI LFPRMRGLSE RLSILKGARP
mip_bovin     ILTRNFTNHW VYWVGPVIGA GLGSLLYDFL LFPRLKSVSE RLSILKGSRP
mip_human     ILTGNFTNHW VYWVGPIIGG GLGSLLYDFL LFPRLKSISE RLSVLKGAKP
mip_chick     VITRNFTNHW VFWAGPLLGA ALAALLYELA LCPRARSMAE RLAV.LRGEP
aqp5_rat      VVMNRFssHW VFWVGPIVGA MLAAILYFYL LFPSSLSLHD RVAVVKGTYE
aqp5_human    VVMNRFsaHW VFWVGPIVGA VLAAILYFYL LFPNSLSLSE RVAIIKGTYE
aqp2_human    VVTGKFDDHW VFWIGPLVGA ILGSLLYNYV LFPPAKSLSE RLAVLKGLEp
aqp4_human    VIMGNWENHW IYWVGPIIGA VLAGGLYEYV FCPDVEFKRR FKEAFSKaqT
aqp4_rat      VIMGNWENHW IYWVGPIIGA VLAGALYEYV FCPDVELKRR LKEAFSKaqT
aqp4_mouse    VIMGNWANHW IYWVGPIMGA VLAGALYEYV FCPDVELKRR LKEAFSKaqT
aqp2_rat      VVTGKFDDHW VFWIGPLVGA IIGSLLYNYL LFPSAKSLQE RLAVLKGLEp
aqp2_mouse    VVTGKFDDHW VFWIGPLVGA IIGSLLYNYL LFPSTKSLQE RLAVLKGLEp
wc2a_arath    VIYnpWDDHW IFWVGPFIGA AIAAFYHQFV LRASGSKSLG SFRSAANV..
aqp6_human    IIIGKFTVHW VFWVGPLMGA LLASLIYNFV LFPDTKTLAQ RLAILTGTVE
wc2c_arath    VIFnpWDDHW IFWVGPFIGA TIAAFYHQFV LRASGSKSLG SFRSAANV..
wc2b_arath    VIYnpWDDHW IFWVGPFIGA AIAAFYHQFV LRASGSKSLG SFRSAANV..
wc1c_arath    IIYnaWDDHW IFWVGPFIGA ALAALYHQLV IRAIPFKSRS ..........
wc1b_arath    IIFnaWDDHW VFWVGPFIGA ALAALYHVIV IRAIPFKSRS ..........
tipw_lyces    IIYnaWNDHW IFWVGPMIGA ALAAIYHQII IRAMPFHRS. ..........
wc1a_arath    IIYnsWDDHW VFWVGPFIGA ALAALYHVVV IRAIPFKSRS ..........
tipw_pea      IVFngWNDHW IFWVGPFIGA ALAALYHQVV IRAIPFKSK. ..........
tipa_arath    LVGWRWHDHW IYWVGPFIGS ALAALIYEYM VIPTEPPTHH AHGVHQPLAP
aqua_atrca    VIyrVWDDHW IFWVGPFVGA LAAAAYHQYV LRAAAIKALG SFRSNPTN..
dip_antma     VASGDFSQNW IYWAGPLIGG ALAGFIYGDV FITAHAPLPT SEDYA.....
aqpz_ecoli    IFQgaLEQLW FFWVVPIVGG IIGGLIYRTL LEKRD..... ..........
tip2_tobac    VVAGDFSQNW IYWAGPLIGG GLAGFIYGDV FIGCHTPLPT SEDYA.....
tip1_tobac    VVAGDFSQNW IYWAGPLIGG GLAGFIYGDV FIGCHTPLPT SEDYA.....
tipg_arath    VVSWTWTNHW VYWAGPLVGG GIAGLIYEVF FINTTHEQLP TTDY......
bib_drome     FVLNKWDSHW VYWFGPLVGG MASGLVYEYI FNSRNRNLRH NKGSIDNDSS
tipr_arath    VVSWTWTNHW VYWAGPLVGG GIAGLIYEVF FINTTHTSSS NHRLLN....
tipa_phavu    LVGWQWHQHW IFWVGPLLGA ALAALVYEYA VIPIEPPPHH HQPLATEDY.
tipg_orysa    LVSWSWESQW VYWVGPLIGG GLAGVIYEVL FISHTHEQLP TTDY......

              251              269
predict_h258  EEYDLDADDI NSRVEMKPK
aqp1_rat      EEYDLDADDI NSRVEMKPK
aqp1_mouse    EEYDLDADDI NSRVEMKPK
aqp1_human    EEYDLDADDI NSRVEMKPK
aqp1_bovin    EEYDLDADDI NSRVEMKPK
aqp1_sheep    EEYDLDADDI NSRVEMKPK
aqpa_ranes    EEYELDGDD. NTRVEMKPK
aqp2_dasno    .......... .........
aqp2_bovin    .......... .........
aqp2_canfa    .......... .........
aqp2_rabit    .......... .........
aqp2_elema    .......... .........
aqp2_horse    .......... .........
aqp2_proha    .......... .........
mip_rat       SDSNGQPEGT GEPVELKTQ
aqp2_oryaf    .......... .........
mip_mouse     SDSNGQPEGT GEPVELKTQ
mip_ranpi     AEPEGQQEAT GEPIELKTQ
mip_bovin     SESNGQPEVT GEPVELKTQ
mip_human     DVSNGQPEVT GEPVELNTQ
mip_chick     PAAAPPPEPP AEPLELKTQ
aqp5_rat      PEEDWEDHRE ERKKTIELT
aqp5_human    PDEDWEEQRE ERKKTMELT
aqp2_human    tDWEEREVRR RQSVELHSP
aqp4_human    KGSYMEVEDN RSQVETDDL
aqp4_rat      KGSYMEVEDN RSQVETEDL
aqp4_mouse    KGSYMEVEDN RSQVETEDL
aqp2_rat      tDWEEREVRR RQSVELHSP
aqp2_mouse    tDWEEREVRR RQSVELHSP
wc2a_arath    .......... .........
aqp6_human    VGTGARAGAE PLKKESQPG
wc2c_arath    .......... .........
wc2b_arath    .......... .........
wc1c_arath    .......... .........
wc1b_arath    .......... .........
tipw_lyces    .......... .........
wc1a_arath    .......... .........
tipw_pea      .......... .........
tipa_arath    EDY....... .........
aqua_atrca    .......... .........
dip_antma     .......... .........
aqpz_ecoli    .......... .........
tip2_tobac    .......... .........
tip1_tobac    .......... .........
tipg_arath    .......... .........
bib_drome     SIHSEDELNY DMDMEKPNK
tipr_arath    .......... .........
tipa_phavu    .......... .........
tipg_orysa    .......... .........

________________________________________________________________________________

   Prediction of:

    - secondary structure,          by PHDsec
    - solvent accessibility,        by PHDacc
    - and helical transmembrane regions,    by PHDhtm

   PHD: Profile fed neural network systems from HeiDelberg
   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

   Author:             Burkhard Rost
                       EMBL, Heidelberg, FRG
                       Meyerhofstrasse 1, 69 117 Heidelberg
                       Internet: Predict-Help@EMBL-Heidelberg.DE

   All rights reserved.

   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   Secondary structure prediction by PHDsec:
   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

   Author:             Burkhard Rost
                       EMBL, Heidelberg, FRG
                       Meyerhofstrasse 1, 69 117 Heidelberg
                       Internet: Rost@EMBL-Heidelberg.DE

   All rights reserved.

About the network method
~~~~~~~~~~~~~~~~~~~~~~~

The network procedure is described in detail in:
1) Rost, Burkhard; Sander, Chris:
  Prediction of protein structure at better than 70% accuracy.
  J. Mol. Biol., 1993, 232, 584-599.

A brief description is given in:
  Rost, Burkhard; Sander, Chris:
  Improved prediction of protein secondary structure by use of se-
  quence profiles and neural networks.
  Proc. Natl. Acad. Sci. U.S.A., 1993, 90, 7558-7562.

The PHD mail server is described in:
2) Rost, Burkhard; Sander, Chris; Schneider, Reinhard:
  PHD - an automatic mail server for protein secondary structure
  prediction.
  CABIOS, 1994, 10, 53-60.

The latest improvement steps (up to 72%) are explained in:
3) Rost, Burkhard; Sander, Chris:
  Combining evolutionary information and neural networks to predict
  protein secondary structure.
  Proteins, 1994,  19, 55-72.

To be quoted for publications of PHD output:
  Papers 1-3 for the prediction of secondary structure and the pre-
  diction server.

About the input to the network
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The prediction is performed by a system of neural networks.
The input is a multiple sequence alignment. It is taken from an HSSP
file (produced by the program MaxHom:
  Sander, Chris & Schneider, Reinhard: Database of Homology-Derived
  Structures and the Structural Meaning of Sequence Alignment.
  Proteins, 1991, 9, 56-68.

For optimal results the alignment should contain sequences with varying
degrees of sequence similarity relative to the input protein.
The following is an ideal situation:

+-----------------+----------------------+
|   sequence:     |  sequence identity   |
+-----------------+----------------------+
| target sequence |  100 %               |
| aligned seq. 1  |   90 %               |
| aligned seq. 2  |   80 %               |
|      ...        |   ...                |
| aligned seq. 7  |   30 %               |
+-----------------+----------------------+

Estimated Accuracy of Prediction
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

A careful cross validation test on some 250 protein chains (in total
about 55,000 residues) with less than 25% pairwise sequence identity
gave the following results:

++================++-----------------------------------------+
|| Qtotal = 72.1% ||      ("overall three state accuracy")   |
++================++-----------------------------------------+

+----------------------------+-----------------------------+
| Qhelix (% of observed)=70% | Qhelix (% of predicted)=77% |
| Qstrand(% of observed)=62% | Qstrand(% of predicted)=64% |
| Qloop  (% of observed)=79% | Qloop  (% of predicted)=72% |
+----------------------------+-----------------------------+
..........................................................................

These percentages are defined by:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

|                    number of correctly predicted residues
|Qtotal =            ---------------------------------------      (*100)
|                          number of all residues
|
|                    no of res correctly predicted to be in helix
|Qhelix (% of obs) = -------------------------------------------- (*100)
|                    no of all res observed to be in helix
|
|
|                    no of res correctly predicted to be in helix
|Qhelix (% of pred)= -------------------------------------------- (*100)
|                    no of all residues predicted to be in helix

..........................................................................

Averaging over single chains
~~~~~~~~~~~~~~~~~~~~~~~~~~~

The most reasonable way to compute the overall accuracies is the above
quoted percentage of correctly predicted residues.  However, since the
user is mainly interested in the expected performance of the prediction
for a particular protein, the mean value when averaging over protein
chains might be of help as well.  Computing first the three state
accuracy for each protein chain, and then averaging over 250 chains
yields the following average:

+-------------------------------====--+
| Qtotal/averaged over chains = 72.2% |
+-------------------------------====--+
| standard deviation          =  9.3% |
+-------------------------------------+

..........................................................................

Further measures of performance
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Matthews correlation coefficient:

+---------------------------------------------+
| Chelix = 0.63, Cstrand = 0.53, Cloop = 0.52 |
+---------------------------------------------+
..........................................................................

Average length of predicted secondary structure segments:

.           +------------+----------+
.           |  predicted | observed |
+-----------+------------+----------+
| Lhelix  = |    10.3    |    9.3   |
| Lstrand = |     5.0    |    5.3   |
| Lloop   = |     7.2    |    5.9   |
+-----------+------------+----------+
..........................................................................

The accuracy matrix in detail:

+---------------------------------------+
|    number of residues with H, E, L    |
+---------+------+------+------+--------+
|         |net H |net E |net L |sum obs |
+---------+------+------+------+--------+
| obs H   |12447 | 1255 | 3990 |  17692 |
| obs E   |  949 | 7493 | 3750 |  12192 |
| obs L   | 2604 | 2875 |19962 |  25441 |
+---------+------+------+------+--------+
| sum Net |16000 |11623 |27702 |  55325 |
+---------+------+------+------+--------+

Note: This table is to be read in the following manner:
     12447 of all residues predicted to be in helix, were observed to
     be in helix, 949 however belong to observed strands, 2604 to
     observed loop regions.  The term "observed" refers to the DSSP
     assignment of secondary structure calculated from 3D coordinates
     of experimentally determined structures (Dictionary of Secondary
     Structure  of Proteins: Kabsch & Sander (1983) Biopolymers, 22,
     2577-2637).

Position-specific reliability index
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The network predicts the three secondary structure types using real
numbers from the output units. The prediction is assigned by choosing
the maximal unit ("winner takes all").  However, the real numbers
contain additional information.
E.g. the difference between the maximal and the second largest output
unit can be used to derive a "reliability index".  This index is given
for each residue along with the prediction.  The index is scaled to
have values between 0 (lowest reliability), and 9 (highest).
The accuracies (Qtot) to be expected for residues with values above a
particular value of the index are given below as well as the fraction
of such residues (%res).:

+------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
| index|  0  |  1  |  2  |  3  |  4  |  5  |  6  |  7  |  8  |  9  |
| %res |100.0| 99.2| 90.4| 80.9| 71.6| 62.5| 52.8| 42.3| 29.8| 14.1|
+------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
|      |     |     |     |     |     |     |     |     |     |     |
| Qtot | 72.1| 72.3| 74.8| 77.7| 80.3| 82.9| 85.7| 88.5| 91.1| 94.2|
|      |     |     |     |     |     |     |     |     |     |     |
+------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
| H%obs| 70.4| 70.6| 73.7| 77.1| 80.1| 83.1| 86.0| 89.3| 92.5| 96.4|
| E%obs| 61.5| 61.7| 63.7| 66.6| 69.1| 71.7| 74.6| 77.0| 77.8| 68.1|
|      |     |     |     |     |     |     |     |     |     |     |
| H%prd| 77.8| 78.0| 80.0| 82.6| 84.7| 86.9| 89.2| 91.3| 93.1| 95.4|
| E%prd| 64.5| 64.7| 67.8| 71.0| 74.2| 77.6| 81.4| 85.1| 89.8| 93.5|
+------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+

The above table gives the cumulative results, e.g. 62.5% of all
residues have a reliability of at least 5.  The overall three-state
accuracy for this subset of almost two thirds of all residues is 82.9%.
For this subset, e.g., 83.1% of the observed helices are correctly
predicted, and 86.9% of all residues predicted to be in helix are
correct.

..........................................................................

The following table gives the non-cumulative quantities, i.e. the
values per reliability index range.  These numbers answer the question:
how reliable is the prediction for all residues labeled with the
particular index i.

+------+-----+-----+-----+-----+-----+-----+-----+-----+-----+
| index|  1  |  2  |  3  |  4  |  5  |  6  |  7  |  8  |  9  |
| %res |  8.8|  9.5|  9.3|  9.1|  9.7| 10.5| 12.5| 15.7| 14.1|
+------+-----+-----+-----+-----+-----+-----+-----+-----+-----+
|      |     |     |     |     |     |     |     |     |     |
| Qtot | 46.6| 50.6| 57.7| 62.6| 67.9| 74.2| 82.2| 88.3| 94.2|
|      |     |     |     |     |     |     |     |     |     |
+------+-----+-----+-----+-----+-----+-----+-----+-----+-----+
| H%obs| 36.8| 42.3| 49.5| 55.2| 61.7| 69.9| 78.8| 87.4| 96.4|
| E%obs| 44.7| 44.5| 52.1| 55.4| 60.9| 68.0| 75.9| 81.0| 68.1|
|      |     |     |     |     |     |     |     |     |     |
| H%prd| 49.9| 52.5| 60.3| 64.2| 69.2| 77.5| 85.4| 89.9| 95.4|
| E%prd| 41.7| 47.1| 53.6| 57.0| 64.0| 71.6| 78.8| 88.8| 93.5|
+------+-----+-----+-----+-----+-----+-----+-----+-----+-----+

For example, for residues with Relindex = 5 64% of all predicted betha-
strand residues are correctly identified.

   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   Solvent accessibility prediction by PHDacc:
   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

   Author:             Burkhard Rost
                       EMBL, Heidelberg, FRG
                       Meyerhofstrasse 1, 69 117 Heidelberg
                       Internet: Rost@EMBL-Heidelberg.DE

   All rights reserved.

About the network method
~~~~~~~~~~~~~~~~~~~~~~~

The network for prediction of secondary structure is described in
detail in:
  Rost, Burkhard; Sander, Chris:
  Prediction of protein structure at better than 70% accuracy.
  J. Mol. Biol., 1993, 232, 584-599.

The analysis of the prediction of solvent exposure is given in:
  Rost, Burkhard; Sander, Chris:
  Conservation and prediction of solvent accessibility in protein
  families.  Proteins, 1994, 20, 216-226.

To be quoted for publications of PHD exposure prediction:
  Both papers quoted above.

Definition of accessibility
~~~~~~~~~~~~~~~~~~~~~~~~~~

For training the residue solvent accessibility the DSSP (Dictionary of
Secondary Structure of Proteins; Kabsch & Sander (1983) Biopolymers, 22,
2577-2637) values of accessible surface area have been used.  The
prediction provides values for the relative solvent accessibility.  The
normalisation is the following:

|                           ACCESSIBILITY (from DSSP in Angstrom)
|RELATIVE_ACCESSIBILITY =   ------------------------------------- * 100
|                               MAXIMAL_ACC (amino acid type i)

where MAXIMAL_ACC (i) is the maximal accessibility of amino acid type i.
The maximal values are:

+----+----+----+----+----+----+----+----+----+----+----+----+
|  A |  B |  C |  D |  E |  F |  G |  H |  I |  K |  L |  M |
| 106| 160| 135| 163| 194| 197|  84| 184| 169| 205| 164| 188|
+----+----+----+----+----+----+----+----+----+----+----+----+
|  N |  P |  Q |  R |  S |  T |  V |  W |  X |  Y |  Z |
| 157| 136| 198| 248| 130| 142| 142| 227| 180| 222| 196|
+----+----+----+----+----+----+----+----+----+----+----+

Notation: one letter code for amino acid, B stands for D or N; Z stands
  for E or Q; and X stands for undetermined.

The relative solvent accessibility can be used to estimate the number
of water molecules (W) in contact with the residue:

W = ACCESSIBILITY /10

The prediction is given in 10 states for relative accessibility, with

RELATIVE_ACCESSIBILITY = (PREDICTED_ACC * PREDICTED_ACC)

where PREDICTED_ACC = 0 - 9.

Estimated Accuracy of Prediction
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

A careful cross validation test on some 238 protein chains (in total
about 62,000 residues) with less than 25% pairwise sequence identity
gave the following results:

Correlation
...........

The correlation between observed and predicted solvent accessibility
is:

-----------
corr = 0.53
-----------

This value ought to be compared to the worst and best case prediction
scenario: random prediction (corr = 0.0) and homology modelling
(corr = 0.66).  (Note: homology modelling yields a relative accurate
prediction in 3D if, and only if, a significantly identical sequence
has a known 3D structure.)

3-state accuracy
................

Often the relative accessibility is projected onto, e.g., 3 states:
  b  = buried       (here defined as < 9% relative accessibility),
  i  = intermediate ( 9% <= rel. acc. < 36% ),
  e  = exposed      ( rel. acc. >= 36% ).

A projection onto 3 states or 2 states (buried/exposed) enables the
compilation of a 3- and 2-state prediction accuracy.  PHD reaches an
overall 3-state accuracy of:
  Q3 = 57.5%
(compared to 35% for random prediction and 70% for homology modelling).

In detail:

+-----------------------------------+-------------------------+
| Qburied       (% of observed)=77% | Qb (% of predicted)=60% |
| Qintermediate (% of observed)= 9% | Qi (% of predicted)=44% |
| Qexposed      (% of observed)=78% | Qe (% of predicted)=56% |
+-----------------------------------+-------------------------+

10-state accuracy
.................

The network predicts relative solvent accessibility in 10 states, with
state i (i = 0-9) corresponding to a relative solvent accessibility of
i*i %.  The 10-state accuracy of the network is:

  Q10 = 24.5%

..........................................................................

These percentages are defined by:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

|                     number of correctly predicted residues
|Q3               = ---------------------------------------      (*100)
|                           number of all residues
|
|                     no of res. correctly predicted to be buried
|Qburied (% of obs) = ------------------------------------------- (*100)
|                     no of all res. observed to be buried
|
|
|                     no of res. correctly predicted to be buried
|Qburied (% of pred)= ------------------------------------------- (*100)
|                     no of all residues predicted to be buried

..........................................................................

Averaging over single chains
~~~~~~~~~~~~~~~~~~~~~~~~~~~

The most reasonable way to compute the overall accuracies is the above
quoted percentage of correctly predicted residues.  However, since the
user is mainly interested in the expected performance of the prediction
for a particular protein, the mean value when averaging over protein
chains might be of help as well.  Computing first the correlation
between observed and predicted accessibility for each protein chan, and
then averaging over all 238 chains yields the following average:

+-------------------------------====--+
| corr/averaged over chains   = 0.53  |
+-------------------------------====--+
| standard deviation          = 0.11  |
+-------------------------------------+

..........................................................................

Further details of performance accuracy
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The accuracy matrix in detail:
..............................

-------+----------------------------------------------------+-----------
\ PHD |    0    1   2   3    4    5     6     7    8    9  |  SUM  %obs
-------+----------------------------------------------------+-----------
OBS  0 | 8611  140   8  44   82  169   772   334   27    0  | 10187 16.6
OBS  1 | 4367  164   0  50  106  231   738   346   44    3  |  6049  9.8
OBS  2 | 3194  168   1  68  125  303   951   513   42    7  |  5372  8.7
OBS  3 | 2760  159   8  80  136  327  1246   746   58   19  |  5539  9.0
OBS  4 | 2312  144   2  72  166  396  1615  1245  124   19  |  6095  9.9
OBS  5 | 1873   96   3  84  138  425  1979  1834  187   27  |  6646 10.8
OBS  6 | 1387   67   1  60   80  278  2237  2627  231   51  |  7019 11.4
OBS  7 | 1082   35   0  32   56  225  1871  3107  302   60  |  6770 11.0
OBS  8 |  660   25   0  27   43  136  1206  2374  325   87  |  4883  7.9
OBS  9 |  325   20   2  27   29   74   648  1159  366  214  |  2864  4.7
-------+----------------------------------------------------+-----------
SUM    |26571 1018  25 544  961 2564 13263 14285 1706  487  |
-------+----------------------------------------------------+-----------

Note: This table is to be read in the following manner:
     8611 of all residues predicted to be in exposed by 0%, were
     observed with 0% relative accessibility.  However, 325 of all
     residues predicted to have 0% are observed as completely exposed
     (obs = 9 -> rel. acc. >= 81%).  The term "observed" refers to the
     DSSP compilation of area of solvent accessibility calculated from
     3D coordinates of experimentally determined structures (Diction-
     ary of Secondary Structure  of Proteins: Kabsch & Sander (1983)
     Biopolymers, 22, 2577-2637).

Accuracy for each amino acid:
.............................

+---+------------------------------+-----+-------+------+
|AA |   Q3 b%o b%p i%o i%p e%o e%p | Q10 |  corr |    N |
+---+------------------------------+-----+-------+------+
| A | 59.0  87  60   2  38  66  57 |  31 | 0.530 | 5054 |
| C | 62.0  91  67   5  39  25  21 |  34 | 0.244 |  893 |
| D | 56.5  21  45   6  49  94  57 |  20 | 0.321 | 3536 |
| E | 60.8   9  40   3  41  98  61 |  21 | 0.347 | 3743 |
| F | 63.3  94  67   9  46  29  37 |  27 | 0.366 | 2436 |
| G | 52.1  75  51   1  31  67  53 |  22 | 0.405 | 4787 |
| H | 50.9  63  53  23  45  71  50 |  18 | 0.442 | 1366 |
| I | 64.9  95  68   6  41  30  38 |  34 | 0.360 | 3437 |
| K | 66.6   2  11   2  37  98  67 |  23 | 0.267 | 3652 |
| L | 61.6  93  65   8  44  31  40 |  31 | 0.368 | 5016 |
| M | 60.1  92  64   5  39  45  44 |  29 | 0.452 | 1371 |
| N | 55.5  45  45   8  38  87  59 |  17 | 0.410 | 2923 |
| P | 53.0  48  48   9  39  83  56 |  18 | 0.364 | 2920 |
| Q | 54.3  27  44   7  44  92  56 |  20 | 0.344 | 2225 |
| R | 49.9  15  47  36  47  76  51 |  18 | 0.372 | 2765 |
| S | 55.6  69  53   3  51  81  56 |  22 | 0.464 | 3981 |
| T | 51.8  61  51   8  38  78  53 |  21 | 0.432 | 3740 |
| V | 61.1  93  65   5  40  39  42 |  34 | 0.418 | 4156 |
| W | 56.2  85  62  20  49  29  27 |  21 | 0.318 |  891 |
| Y | 49.7  73  52  33  49  36  38 |  19 | 0.359 | 2301 |
+---+------------------------------+-----+-------+------+

Abbreviations:

AA:   amino acid in one-letter code
b%o, i%o, e%o:   = Qburied, Qintermediate, Qexposed (% of observed),
     i.e. percentage of correct prediction in each state, see above
b%p, i%p, e%p:   = Qburied, Qintermediate, Qexposed (% of predicted),
     i.e. probability of correct prediction in each state, see above
b%o:  = Qburied (% of observed), see above
Q10:  percentage of correctly predicted residues in each of the 10
     states of predicted relative accessibility.
corr: correlation between predicted and observed rel. acc.
N:    number of residues in data set

Accuracy for different secondary structure:
...........................................

+--------+------------------------------+----+-------+-------+
| type   |   Q3 b%o b%p i%o i%p e%o e%p |Q10 |  corr |     N |
+--------+------------------------------+----+-------+-------+
| helix  | 59.5  79  64   8  44  80  56 | 27 | 0.574 | 20100 |
| strand | 61.3  84  73   9  46  69  37 | 35 | 0.524 | 13356 |
| loop   | 54.4  64  43  11  44  78  61 | 18 | 0.442 | 27968 |
+--------+------------------------------+----+-------+-------+

Abbreviations as before.

Position-specific reliability index
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The network predicts the 10 states for relative accessibility using real
numbers from the output units. The prediction is assigned by choosing
the maximal unit ("winner takes all").  However, the real numbers
contain additional information.
E.g. the difference between the maximal and the second largest output
unit (with the constraint that the second largest output is compiled
among all units at least 2 positions off the maximal unit) can be used
to derive a "reliability index".  This index is given for each residue
along with the prediction.  The index is scaled to have values between
0 (lowest reliability), and 9 (highest).
The accuracies (Q3, corr, asf.) to be expected for residues with values
above a particular value of the index are given below as well as the
fraction of such residues (%res).:

+---+------------------------------+----+-------+-------+
|RI |   Q3 b%o b%p i%o i%p e%o e%p |Q10 |  corr |  %res |
+---+------------------------------+----+-------+-------+
| 0 | 57.5  77  60   9  44  78  56 | 24 | 0.535 | 100.0 |
| 1 | 59.1  76  63   9  45  82  57 | 25 | 0.560 |  91.2 |
| 2 | 61.7  79  66   4  47  87  58 | 27 | 0.594 |  77.1 |
| 3 | 66.6  87  70   1  51  89  63 | 30 | 0.650 |  57.1 |
| 4 | 70.0  89  72   0  83  91  67 | 32 | 0.686 |  45.8 |
| 5 | 72.9  92  75   0   0  93  70 | 34 | 0.722 |  35.6 |
| 6 | 76.3  95  77   0   0  93  75 | 36 | 0.769 |  24.7 |
| 7 | 79.0  97  79   0   0  93  78 | 39 | 0.803 |  16.0 |
| 8 | 80.9  98  80   0   0  91  81 | 43 | 0.824 |   9.6 |
| 9 | 81.2  99  80   0   0  88  83 | 45 | 0.828 |   5.9 |
+---+------------------------------+----+-------+-------+

Abbreviations as before.

The above table gives the cumulative results, e.g. 45.8% of all
residues have a reliability of at least 4.  The correlation for this
most reliably predicted half of the residues is 0.686, i.e. a value
comparable to what could be expected if homology modelling were
possible.  For this subset of 45.8% of all residues, 89% of the buried
residues are correctly predicted, and 72% of all residues predicted to
be buried are correct.

..........................................................................

The following table gives the non-cumulative quantities, i.e. the
values per reliability index range.  These numbers answer the question:
how reliable is the prediction for all residues labeled with the
particular index i.

+---+------------------------------+----+-------+-------+
|RI |   Q3 b%o b%p i%o i%p e%o e%p |Q10 |  corr |  %res |
+---+------------------------------+----+-------+-------+
| 0 | 40.9  79  40  16  41  21  40 | 14 | 0.175 |   8.8 |
| 1 | 45.4  61  46  28  44  48  44 | 17 | 0.278 |  14.1 |
| 2 | 47.4  53  52  10  46  80  44 | 19 | 0.343 |  19.9 |
| 3 | 52.9  75  59   4  50  77  47 | 23 | 0.439 |  11.4 |
| 4 | 60.0  81  63   0  83  84  56 | 25 | 0.547 |  10.1 |
| 5 | 65.2  82  70   0   0  93  62 | 28 | 0.607 |  10.9 |
| 6 | 71.3  90  72   0   0  94  70 | 31 | 0.692 |   8.8 |
| 7 | 76.0  94  76   0   0  95  75 | 34 | 0.762 |   6.3 |
| 8 | 80.5  97  81   0   0  94  79 | 39 | 0.808 |   3.8 |
| 9 | 81.2  99  80   0   0  88  83 | 45 | 0.828 |   5.9 |
+---+------------------------------+----+-------+-------+

For example, for residues with RI = 4 83% of all predicted intermediate
residues are correctly predicted as such.

   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   Prediction of helical transmembrane segments by PHDhtm:
   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

   Author:             Burkhard Rost
                       EMBL, Heidelberg, FRG
                       Meyerhofstrasse 1, 69 117 Heidelberg
                       Internet: Rost@EMBL-Heidelberg.DE

   All rights reserved.

About the network method
~~~~~~~~~~~~~~~~~~~~~~~

The PHD mail server is described in:
  Rost, Burkhard; Sander, Chris; Schneider, Reinhard:
  PHD - an automatic mail server for protein secondary structure
  prediction.
  CABIOS, 1994, 10, 53-60.

To be quoted for publications of PHDhtm output:
  Rost, Burkhard; Casadio, Rita; Fariselli, Piero; Sander, Chris:
  Prediction of helical transmembrane segments at 95% accuracy.
  Protein Science, 1995, 4, 521-533.

Estimated Accuracy of Prediction
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

A cross validation test on 69 helical trans-membrane  proteins (in total
about 30,000 residues) with less than 25% pairwise sequence identity
gave the following results:

++================++-----------------------------------------+
|| Qtotal = 94.7% ||      ("overall two state accuracy")     |
++================++-----------------------------------------+

+----------------------------+-----------------------------+
| Qhelix (% of observed)=92% | Qhelix (% of predicted)=83% |
| Qloop  (% of observed)=96% | Qloop  (% of predicted)=97% |
+----------------------------+-----------------------------+

..........................................................................

These percentages are defined by:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

|                    number of correctly predicted residues
|Qtotal =            ---------------------------------------      (*100)
|                          number of all residues
|
|                    no of res correctly predicted to be in helix
|Qhelix (% of obs) = -------------------------------------------- (*100)
|                    no of all res observed to be in helix
|
|
|                    no of res correctly predicted to be in helix
|Qhelix (% of pred)= -------------------------------------------- (*100)
|                    no of all residues predicted to be in helix

..........................................................................

Further measures of performance
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Matthews correlation coefficient:

+---------------------------------------------+
| Chelix = 0.84, Cloop = 0.84                 |
+---------------------------------------------+
..........................................................................

Average length of predicted secondary structure segments:

|           +------------+----------+
|           |  predicted | observed |
+-----------+------------+----------+
| Lhelix  = |    24.6    |   22.2   |
+-----------+------------+----------+
..........................................................................

The accuracy matrix in detail:

+---------------------------------+
|    number of residues with H, L |
+---------+------+-------+--------+
|         |net H | net L |sum obs |
+---------+------+-------+--------+
| obs H   | 5214 |   492 |   5706 |
| obs L   | 1050 | 22423 |  23473 |
+---------+------+-------+--------+
| sum Net | 6264 | 22915 |  29179 |
+---------+------+-------+--------+

Note: This table is to be read in the following manner:
     5214 of all residues predicted to be in a helical trans-membrane
     region, were observed to be in the lipid bilayer, 1050 however
     were observed either inside or outside of the protein, i.e. in
     loop (or non-membrane) regions. The term "observed" refers to DSSP
     assignment of secondary structure calculated from 3D coordinates
     of experimentally determined structures (Dictionary of Secondary
     Structure  of Proteins: Kabsch & Sander (1983) Biopolymers, 22,
     2577-2637) where these were available.  For all other proteins,
     the assignment of trans-membrane segments has been taken from the
     Swissprot data bank (Bairoch, A.; Boeckmann, B.: The SWISS-PROT
     protein sequence data bank. Nucl. Acids Res. 20: 2019-2022, 1992).

..........................................................................

Overlap between predicted and observed segments:

+-----------------+---------------+----------------+
| segment overlap | % of observed | % of predicted |
|   Sov helix     |      95.6%    |      95.5%     |
|   Sov loop      |      83.6%    |      97.2%     |
+-----------------+---------------+----------------+
|   Sov total     |      86.0%    |      96.8%     |
+-----------------+---------------+----------------+

     Definition of Sov in: Rost et al., JMB, 1994, 235, 13-26.

     As helical trans-membrane segments are longer than globular heli-
     ces, correctly predicted segments can easily be made out.  PHDhtm
     misses 5 out of 258 observed segments, predicts 6 where non is
     observed and 3 times the predicted helical segment overlaps two
     observed regions.  Thus, in total more than 95% of all segments
     are correctly predicted.

..........................................................................

Entropy of prediction (information measure):

+-----------------+
| I = 0.64        |
+-----------------+

     (For comparison: homology modelling of globular proteins in three
     states: I=0.62.)
     Definition of Sov in: Rost et al., JMB, 1994, 235, 13-26.

Position-specific reliability index
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The network predicts two states: helical trans-membrane region and rest
using two output units.  The prediction is assigned by choosing the ma-
ximal unit ("winner takes all").  However, the real numbers of the out-
put units contain additional information.
E.g. the difference between the two output units can be used to derive
a "reliability index".  This index is given for each residue along with
the prediction.  The index is scaled to have values between 0 (lowest
reliability), and 9 (highest).
The accuracies (Qtot) to be expected for residues with values above a
particular value of the index are given below as well as the fraction
of such residues (%res).:

+------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
| index|  0  |  1  |  2  |  3  |  4  |  5  |  6  |  7  |  8  |  9  |
| %res |100.0| 98.8| 97.3| 95.9| 94.1| 92.3| 89.9| 86.2| 75.0| 66.8|
+------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
|      |     |     |     |     |     |     |     |     |     |     |
| Qtot | 94.7| 95.2| 95.6| 96.2| 96.7| 97.2| 97.7| 98.4| 99.4| 99.8|
|      |     |     |     |     |     |     |     |     |     |     |
+------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
| H%obs| 91.8| 92.9| 93.8| 94.4| 95.0| 95.7| 96.2| 96.8| 95.5| 78.7|
| L%obs| 95.3| 95.7| 96.1| 96.6| 97.0| 97.5| 98.1| 98.8| 99.7|100.0|
|      |     |     |     |     |     |     |     |     |     |     |
| H%prd| 82.7| 83.8| 85.0| 86.7| 88.1| 89.7| 91.4| 93.8| 96.3| 97.1|
| L%prd| 97.9| 98.3| 98.5| 98.7| 98.8| 99.0| 99.2| 99.4| 99.7| 99.9|
+------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+

The above table gives the cumulative results, e.g. 92.3% of all
residues have a reliability of at least 5.  The overall two-state
accuracy for this subset is 97.2%.  For this subset, e.g., 95.7% of
the observed helical trans-membrane residues are correctly predicted,
and 89.7% of all residues predicted to be in helical trans-membrane
segment are correct.

The resulting network (PHD) prediction is:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

________________________________________________________________________________

 PHD: Profile fed neural network systems from HeiDelberg
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 Prediction of:
    secondary structure,               by PHDsec
    solvent accessibility,             by PHDacc
    and helical transmembrane regions,     by PHDhtm

 Author:
    Burkhard Rost
    EMBL, 69012 Heidelberg, Germany
    Internet: Rost@EMBL-Heidelberg.DE

 All rights reserved.

 The network systems are described in:

 PHDsec:    B Rost & C Sander: JMB, 1993, 232, 584-599.
        B Rost & C Sander: Proteins, 1994, 19, 55-72.
 PHDacc:    B Rost & C Sander: Proteins, 1994, 20, 216-226.
 PHDhtm:    B Rost et al.:     Prot. Science, 1995, 4, 521-533.

 Some statistics
 ~~~~~~~~~~~~~~~

 Percentage of amino acids:
 +--------------+--------+--------+--------+--------+--------+
 | AA:          |    L   |    A   |    S   |    G   |    I   |
 | % of AA:     |   13.0 |   10.0 |    9.7 |    8.9 |    8.6 |
 +--------------+--------+--------+--------+--------+--------+
 | AA:          |    V   |    R   |    T   |    F   |    D   |
 | % of AA:     |    7.8 |    5.2 |    4.5 |    4.5 |    4.5 |
 +--------------+--------+--------+--------+--------+--------+
 | AA:          |    N   |    Q   |    E   |    P   |    K   |
 | % of AA:     |    4.1 |    3.0 |    3.0 |    2.6 |    2.6 |
 +--------------+--------+--------+--------+--------+--------+
 | AA:          |    Y   |    M   |    W   |    H   |    C   |
 | % of AA:     |    1.9 |    1.9 |    1.5 |    1.5 |    1.5 |
 +--------------+--------+--------+--------+--------+--------+

 Percentage of secondary structure predicted:
 +--------------+--------+--------+--------+
 | SecStr:      |    H   |    E   |    L   |
 | % Predicted: |   43.9 |   16.7 |   39.4 |
 +--------------+--------+--------+--------+

 According to the following classes:
    all-alpha:   %H>45 and %E< 5; all-beta : %H<5 and %E>45
    alpha-beta : %H>30 and %E>20; mixed:    rest,
 this means that the predicted class is:           mixed class

 PHD output for your protein
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~

 Tue Nov 24 17:44:57 1998
 Jury on:       10    different architectures (version   5.94_317 ).
 Note: differently trained architectures, i.e., different versions can
 result in different predictions.

 About the protein
 ~~~~~~~~~~~~~~~~~

 HEADER     /home/phd/server/work/predict_h25873-220
 COMPND
 SOURCE
 AUTHOR
 SEQLENGTH   269
 NCHAIN        1 chain(s) in predict_h25873-22040 data set
 NALIGN       48
 (=number of aligned sequences in HSSP file)

 Abbreviations: PHDsec
 ~~~~~~~~~~~~~~~~~~~~~

 sequence:
    AA : amino acid sequence
 secondary structure:
    HEL: H=helix, E=extended (sheet), blank=other (loop)
    PHD: Profile network prediction HeiDelberg
    Rel: Reliability index of prediction (0-9)
 detail:
    prH: 'probability' for assigning helix
    prE: 'probability' for assigning strand
    prL: 'probability' for assigning loop
         note: the 'probabilites' are scaled to the interval 0-9, e.g.,
               prH=5 means, that the first output node is 0.5-0.6
 subset:
    SUB: a subset of the prediction, for all residues with an expected
         average accuracy > 82% (tables in header)
         note: for this subset the following symbols are used:
      L: is loop (for which above " " is used)
    ".": means that no prediction is made for this residue, as the
         reliability is:  Rel < 5

 Abbreviations: PHDacc
 ~~~~~~~~~~~~~~~~~~~~~

    SS : secondary structure
    HEL: H=helix, E=extended (sheet), blank=other (loop)
 solvent accessibility:
    3st: relative solvent accessibility (acc) in 3 states:
         b = 0-9%, i = 9-36%, e = 36-100%.
    PHD: Profile network prediction HeiDelberg
    Rel: Reliability index of prediction (0-9)
    O_3: observed relative acc. in 3 states: B, I, E
         note: for convenience a blank is used intermediate (i).
    P_3: predicted relative accessibility in 3 states
    10st:relative accessibility in 10 states:
         = n corresponds to a relative acc. of n*n %
 subset:
    SUB: a subset of the prediction, for all residues with an expected
         average correlation > 0.69 (tables in header)
         note: for this subset the following symbols are used:
    "I": is intermediate (for which above " " is used)
    ".": means that no prediction is made for this residue, as the
         reliability is: Rel < 4

 Abbreviations: PHDhtm
 ~~~~~~~~~~~~~~~~~~~~~

 secondary structure:
    HL:  T=helical transmembrane region, blank=other (loop)
    PHD: Profile network prediction HeiDelberg
    PHDF:filtered prediction, i.e., too long transmembrane segments
         are split, too short ones are deleted
    Rel: Reliability index of prediction (0-9)
 detail:
    prH: 'probability' for assigning helical transmembrane region
    prL: 'probability' for assigning loop
         note: the 'probabilites' are scaled to the interval 0-9, e.g.,
               prH=5 means, that the first output node is 0.5-0.6
 subset:
    SUB: a subset of the prediction, for all residues with an expected
         average accuracy > 82% (tables in header)
         note: for this subset the following symbols are used:
      L: is loop (for which above " " is used)
    ".": means that no prediction is made for this residue, as the
         reliability is:  Rel < 5

 protein:       predict        length      269

                  ....,....1....,....2....,....3....,....4....,....5....,....6
         AA      |MASEIKKKLFWRAVVAEFLAMTLFVFISIGSALGFNYPLERNQTLVQDNVKVSLAFGLSI|
         PHD sec |       HHHHHHHHHHHHHHHHHHHHHHHHHHEE            HHHHHHHHHHHHH|
         Rel sec |998443148899999999999998997676530312469989998623353579999999|
 detail:
         prH sec |001223468899999999999998888777653112210000000145566788999999|
         prE sec |000011000000000000000001001111233542100000000000323211000000|
         prL sec |998665420100000000000000000011112244578988998753100000000000|
 subset: SUB sec |LLL.....HHHHHHHHHHHHHHHHHHHHHHH......LLLLLLLLL...H.HHHHHHHHH|

 ACCESSIBILITY
 3st:    P_3 acc |eeeebee bbb bbbbbbbbbbbbbbbbbbbbbebeee eeeeeeeeebbbbbbbbbbbb|
 10st:   PHD acc |997706650005000000000000000000000607775779776677000000000000|
         Rel acc |735421110541467608662789996343122133420454330023453975664547|
 subset: SUB acc |e.ee.....bb.bbbb.bbb.bbbbbb.b.......e..eee......bb.bbbbbbbbb|
                  ....,....7....,....8....,....9....,....10...,....11...,....12
         AA      |ATLAQSVGHISGAHSNPAVTLGLLLSCQISILRAVMYIIAQCVGAIVASAILSGITSSLL|
         PHD sec |HHHHHHHHHE      HHHHEHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH     |
         Rel sec |999996412122653167703135552356779999999999999999999998467213|
 detail:
         prH sec |998986544334223477843456665567779999999999999999999998611343|
         prE sec |001001123420010000145432101221110000000000000000000000000000|
         prL sec |000001232245765521000000123210000000000000000000000000278555|
 subset: SUB sec |HHHHHH......LL..HHH....HHH..HHHHHHHHHHHHHHHHHHHHHHHHHH.LL...|

 ACCESSIBILITY
 3st:    P_3 acc |bbbbebbbebbbbbb bbbbbbbbbbbebbbbbbbbbbbbbbbbbbbbbbbbeebbeeeb|
 10st:   PHD acc |000060006000000500000000000600000000000000000000000067006760|
         Rel acc |456515321655013144869663400154551757478936465465467713401400|
 subset: SUB acc |bbbb.b...bbb....bbbbbbb.b...bbbb.bbbbbbb.bbbbbbbbbbb..b..e..|
                  ....,....13...,....14...,....15...,....16...,....17...,....18
         AA      |ENSLGRNDLARGVNSGQGLGIEIIGTLQLVLCVLATTDRRRRDLGGSAPLAIGLSVALGH|
         PHD sec |       HHH       EEEEEEEEEEEEEEEEEEE            E E   HHHHHH|
         Rel sec |359985212134223651899898866789799875436658889963211351457756|
 detail:
         prH sec |320002345432332111000000000000100000221120000000001113567767|
         prE sec |100000000000011014899888877789789886100000000013544222221111|
         prL sec |568986543466545763100000011100000112567768889975454564210111|
 subset: SUB sec |.LLLLL.........LL.EEEEEEEEEEEEEEEEEE..LLLLLLLLL.....L..HHHHH|

 ACCESSIBILITY
 3st:    P_3 acc |eeebbbebbbeebeebeebbbbbbbbbbbbbbbbbbbeeeeeeeebbbbbbbbbbbbbbb|
 10st:   PHD acc |677000600077076077000000000000000000077767767000000000000000|
         Rel acc |133100124043040233247198656399879530035414413123255869586654|
 subset: SUB acc |........b.e..e.....bb.bbbbb.bbbbbb....ee.ee......bbbbbbbbbbb|
                  ....,....19...,....20...,....21...,....22...,....23...,....24
         AA      |LLAIDYTGCGINPARSFGSAVLTRNFSNHWIFWVGPFIGSALAVLIYDFILAPRSSDFTD|
         PHD sec |HEEEE E          HHHEEEE    EEEEEE HHHHHHHHHHHHHEEEEE       |
         Rel sec |321341126989622145152653534229996251699999999973147525556642|
 detail:
         prH sec |521100000000145432463121122000000114789999999875421111121124|
         prE sec |244564431000000000015765121358997510000000000013467642110000|
         prL sec |233234457889754567411012655530002364200000000010010136667765|
 subset: SUB sec |........LLLLL....H.H.EE.L....EEEE.L.HHHHHHHHHHH...EE.LLLLL..|

 ACCESSIBILITY
 3st:    P_3 acc |bbbbebbbbbbebb bbbbbbbbeebeebbbbbbbbbbbbbbbbbbbbbbbbeeeee ee|
 10st:   PHD acc |000060000006005000000007606600000000000000000000000076777577|
         Rel acc |754424240102242141047612131118967874356346635751777031345044|
 subset: SUB acc |bbbb.b.b.....b..b..bbb.......bbbbbbb.bb.bbb.bbb.bbb....ee.ee|
                  ....,....25...,....26...,....27...,....28...,....29...,....30
         AA      |RMKVWTSGQVEEYDLDADDINSRVEMKPK|
         PHD sec |HHHHHH                       |
         Rel sec |66775259975467555457776422699|
 detail:
         prH sec |77887520012221222221111100000|
         prE sec |00000000000000000000001233200|
         prL sec |11112379987678777678887655799|
 subset: SUB sec |HHHHH.LLLLL.LLLLL.LLLLL...LLL|

 ACCESSIBILITY
 3st:    P_3 acc |ebebbeeeeeeeeeeeeeeeeeebeeeee|
 10st:   PHD acc |60700787677777677777767067789|
         Rel acc |10411563134335144444514212559|
 subset: SUB acc |..e..ee...e..e.eeeeee.e...eee|

 PHDhtm Helical transmembrane prediction
        note: PHDacc and PHDsec are reliable for water-
              soluble globular proteins, only.  Thus,
              please take the  predictions above with
              particular caution wherever transmembrane
              helices are predicted by PHDhtm!

 PHDhtm
---
--- PhdTopology REFINEMENT AND TOPOLOGY PREDICTION: SYMBOLS
--- AA           : amino acid in one-letter code
--- PHD htm      : HTM's predicted by the PHD neural network
---                system (T=HTM, ' '=not HTM)
--- Rel htm      : Reliability index of prediction (0-9, 0 is low)
--- detail       : Neural network output in detail
--- prH htm      : 'Probability' for assigning a helical trans-
---                membrane region (HTM)
--- prL htm      : 'Probability' for assigning a non-HTM region
---          note: 'Probabilites' are scaled to the interval
---                0-9, e.g., prH=5 means, that the first
---                output node is 0.5-0.6
--- subset       : Subset of more reliable predictions
--- SUB htm      : All residues for which the expected average
---                accuracy is > 82% (tables in header).
---          note: for this subset the following symbols are used:
---             L: is loop (for which above ' ' is used)
---           '.': means that no prediction is made for this,
---                residue as the reliability is:  Rel < 5
--- other        : predictions derived based on PHDhtm
--- PHDFhtm      : filtered prediction, i.e., too long HTM's are
---                split, too short ones are deleted
--- PHDRhtm      : refinement of neural network output
--- PHDThtm      : topology prediction based on refined model
---                symbols used:
---             i: intra-cytoplasmic
---             T: transmembrane region
---             o: extra-cytoplasmic
---
--- PhdTopology REFINEMENT AND TOPOLOGY PREDICTION
                  ....,....1....,....2....,....3....,....4....,....5....,....6
         AA      |MASEIKKKLFWRAVVAEFLAMTLFVFISIGSALGFNYPLERNQTLVQDNVKVSLAFGLSI|
         PHD htm |              TTTTTTTTTTTTTTTTTTT               TTTTTTTTTTTT|
 detail:         |                                                            |
         prH htm |000000000001136788999999999988875321110000000123678889999988|
         prL htm |999999999998863211000000000011124678889999999876321110000011|
  other:         |                                                            |
         PHDFhtm |              TTTTTTTTTTTTTTTTTTT                TTTTTTTTTTT|
         PHDRhtm |              TTTTTTTTTTTTTTTTTT                 TTTTTTTTTTT|
         PHDThtm |iiiiiiiiiiiiiiTTTTTTTTTTTTTTTTTToooooooooooooooooTTTTTTTTTTT|
 subset:         |                                                            |
         SUB htm |............................................................|
                  ....,....7....,....8....,....9....,....10...,....11...,....12
         AA      |ATLAQSVGHISGAHSNPAVTLGLLLSCQISILRAVMYIIAQCVGAIVASAILSGITSSLL|
         PHD htm |TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT     |
 detail:         |                                                            |
         prH htm |888888877777666677788888888888888888888888888888888876543211|
         prL htm |111111122222333322211111111111111111111111111111111123456788|
  other:         |                                                            |
         PHDFhtm |TTTTTTTTTTTTTTTT    TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT       |
         PHDRhtm |TTTTTTTT             TTTTTTTTTTTTTTTTTTTTTTTTT              |
         PHDThtm |TTTTTTTTiiiiiiiiiiiiiTTTTTTTTTTTTTTTTTTTTTTTTToooooooooooooo|
 subset:         |                                                            |
         SUB htm |............................................................|
                  ....,....13...,....14...,....15...,....16...,....17...,....18
         AA      |ENSLGRNDLARGVNSGQGLGIEIIGTLQLVLCVLATTDRRRRDLGGSAPLAIGLSVALGH|
         PHD htm |               TTTTTTTTTTTTTTTTTTT             TTTTTTTTTTTTT|
 detail:         |                                                            |
         prH htm |000000000001234567788888999988887643211111111235788899998888|
         prL htm |999999999998765432211111000011112356788888888764211100001111|
  other:         |                                                            |
         PHDFhtm |               TTTTTTTTTTTTTTTTTTT             TTTTTTTTTTTTT|
         PHDRhtm |                TTTTTTTTTTTTTTTTTT              TTTTTTTTTTTT|
         PHDThtm |ooooooooooooooooTTTTTTTTTTTTTTTTTTiiiiiiiiiiiiiiTTTTTTTTTTTT|
 subset:         |                                                            |
         SUB htm |............................................................|
                  ....,....19...,....20...,....21...,....22...,....23...,....24
         AA      |LLAIDYTGCGINPARSFGSAVLTRNFSNHWIFWVGPFIGSALAVLIYDFILAPRSSDFTD|
         PHD htm |TTTTTTTTT            TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT        |
 detail:         |                                                            |
         prH htm |888887765443432233334566777777788888888888888888887542100000|
         prL htm |111112234556567766665433222222211111111111111111112457899999|
  other:         |                                                            |
         PHDFhtm |TTTTTTTTT            TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT        |
         PHDRhtm |TTTTTT                         TTTTTTTTTTTTTTTTTTT          |
         PHDThtm |TTTTTToooooooooooooooooooooooooTTTTTTTTTTTTTTTTTTTiiiiiiiiii|
 subset:         |                                                            |
         SUB htm |............................................................|
                  ....,....25...,....26...,....27...,....28...,....29...,....30
         AA      |RMKVWTSGQVEEYDLDADDINSRVEMKPK|
         PHD htm |                             |
 detail:         |                             |
         prH htm |00000000000000000000000000000|
         prL htm |99999999999999999999999999999|
  other:         |                             |
         PHDFhtm |                             |
         PHDRhtm |                             |
         PHDThtm |iiiiiiiiiiiiiiiiiiiiiiiiiiiii|
 subset:         |                             |
         SUB htm |.............................|
---
--- PhdTopology REFINEMENT AND TOPOLOGY PREDICTION END
---

________________________________________________________________________________

________________________________________________________________________________

-----------------------------------------------------------------------------
---   PredictProtein: NEWS from January, 1997                             ---
---                                                                       ---
---   Dear user,                                                          ---
---                                                                       ---
---      as of  January 1, 1997,  EMBL has effectively decided to not     ---
---   support the PredictProtein service by personal resources.  I do     ---
---   maintain the program, so to speak, in my private time. However,     ---
---   my contract obliges me to do science, instead.   Unfortunately,     ---
---   the computer environment at  EMBL  is at the same time starting     ---
---   to become increasingly unstable.   Consequence of these two re-     ---
---   cent developments is that the  PredictProtein service is not as     ---
---   stable as it was.                                                   ---
---                                                                       ---
---      I apologise for the problems this may cause.  In particular,     ---
---   I apologise for my inability to reply to the 20-30 daily,  per-     ---
---   sonal mails, and suggest to  re-submit requests after 24 hours!     ---
---                                                                       ---
---   Hoping  that I shall  find  a more convenient  solution for the     ---
---   future of the PredictProtein I remain with my best regards,         ---
---                                                                       ---
---   Burkhard Rost                                                       ---
-----------------------------------------------------------------------------
---   PredictProtein: NEWS from April, 1998                               ---
---                                                                       ---
--------------------------------                                          ---
---   MOVING PredictProtein                                               ---
---   There appears to be light on the horizon! PP will may be having     ---
---   many hickups over the next months (as I shall leave EMBL). How-     ---
---   ever, the server seems to have a fair chance of survival thanks     ---
---   to a major support that is being raised by Columbia University,     ---
---   New York, U.S.A.).   I hope that this will settle the issue for     ---
---   the years to come ...                                               ---
--------------------------------                                          ---
---   WARNING                                                             ---
---   After a  major  rewriting of most of the PP code over the last,     ---
---   I am afraid  that  not all errors have been traced by me,  yet.     ---
---   Thus, please have mercy and report any bug you'll encounter!        ---
---                                             THANKS, Burkhard Rost     ---
--------------------------------                                          ---
---   NEW PREDICTION DEFAULTS                                             ---
---   * Coiled-coil regions: now by default the program COILS written by  ---
---     Andrei Lupas is run on your sequence. An output is returned if a  ---
---     coiled-coil region has been detected.                             ---
---   * Functional sequence motifs: now by default the PROSITE database   ---
---     written by Amos Bairoch, Philip Bucher and Kay Hofmann is scanned ---
---     for sequence motifs. An output is returned if any motif has been  ---
---     detected.                                                         ---
--------------------------------                                          ---
---   see http://www.embl-heidelberg.de/predictprotein/ppNews.html        ---
---     for a description of the following new options.                   ---
---   NEW INPUT OPTION                                                    ---
---   * Your input sequence(s) in FASTA-list format ("# FASTA list ")     ---
---   NEW OUTPUT OPTIONS                                                  ---
---   * Return also BLASTP output ("return blast")                        ---
---   * Return prediction additionally in RDB format ("return phd rdb")   ---
---   * Return topits hssp  ("return topits hssp")                        ---
---   * Return topits strip ("return topits strip")                       ---
---   * Return topits own   ("return topits own")                         ---
---   * Return no coils     ("return no coils")                           ---
---   * Return no prosite   ("return no prosite")                         ---
-----------------------------------------------------------------------------
Generated by dwww version 1.15 on Thu May 23 05:54:36 CEST 2024.