CAFASP3 SCOP assessment

by Julian Gough


Results based on SCOP 1.61 (and Alexey Murzin's provisional classification of CASP targets)

This evaluation was carried out on those models submitted by automatic servers in the CASP5 competition via CAFASP. This 'fold recognition' assessment compares the ability of servers to detect the correct fold or superfamily, and disregards the quality of the model or alignment generated from the template. Since this evaluation requires a single template from each server for each target, there are limitations, so please read the full description and caveats below.

Furthermore this is a goal-driven assessment. It provides an objective ranking for the servers that aim at the prediction of protein fold and probable evolutionary relationship as classified in SCOP. Note that many servers do not necessarily have the same goal but instead aim at the recognition of a large fragment of similar structure in a protein of known structure regardless of its SCOP classification. I think that for such servers this ranking will be largerly irrelevant.

Also available is the same evaluation of LiveBench results.


SCOP assessment of templates for the 56 targets in CAFASP 3.

RankIncludedServerSensitivitySpecifity12345678910
156fugu34140.738414141414141414141
155ffas034140.537404141414141414141
156fugsa4140.238404040404041414141
155pmodel34140.136404040404141414141
1563ds54140.138383841414141414141
1563dsn4040.040404040404040404040
1563ds34040.040404040404040404040
253pcons34039.136373939404040404040
255pmodel4039.039393939393939393939
255inbgu4038.638383838383939393940
255shgu3938.637373939393939393939
254pcons23938.536373939393939393939
356supfampp3837.037373737373737373737
355samt993936.934363737373737373839
356mgenthreader3936.318373838383839393939
355forte13936.234343535353537393939
353orfeus3836.032333437373737373838
354foldfit3735.832333437373737373737
456superfamily3535.035353535353535353535
456mpalign3635.035353535353535353535
555burnham3534.331333435353535353535
556genthreader3634.030343434343434343636
556pcomb3633.526282936363636363636
534alax3333.033
552pspt3632.831333333333333333333
556raptor3732.327283232343434343434
553pdbblast3232.032323232323232323232
556prospect3831.414303031313434363737
642samt023029.727303030303030303030
745jigsaw3327.423252527282828292932
827pilot2121.0212121212121
938libellula2619.33131718202424242426
952rpfold1919.019191919191919191919
1047protinfocm2317.611151818181818202020
1143loopp128.26778999999
1213frost54.945555555
1325cnbpred41.80222222222
1444protfinder70.70001111111

The sensitivity score is the total number of true positives. The specificity score is the average of the 10 columns. Each column shows the number of true positivies before the 'n'th false positive, where 'n' is the column number. The first column shows the number of targets which template information was available for.

The full data is available in a flat file here. Also alternative tables which classify at the fold or superfamily level, and can include the targets split into domains.

LevelCASP onlyinclude split domains
superfamilytabletable
foldtabletable
Sub setsCASP onlyinclude split domains
HMtabletable
FRtabletable

General procedure

The exact ruleset used is available here.
  • A single template is assumed for every model.
  • The template used for each model will be the first one listed in the 'PARENT' field of the submitted model (raw submission).
  • All SCOP domains in the template and the target will be compared.
  • If any pair of domains in the target and the template belong to the same SCOP superfamily, the model is judged as 'true'.
  • When judging at the superfamily level, if any pair of domains in the target and the template belong to the same SCOP fold, but no pair belongs to the same superfamily, the model is judged as ambiguous and is neither 'true' nor 'false'.
  • If no pair of domains in the target and the template belong to the same SCOP fold, the model is judged as 'false'.
  • There are documented exceptions to the classification in SCOP, and these are taken into consideration wherever humanly possible. The major ones are listed in the following three points in the list.
  • If any pair of domains in the target and the template belong to the TIM-barrel fold the model is judged as 'true'.
  • If any pair of domains in the target and the template belong to any of the Rossmann folds (NAD(P), FAD/NAD(P), or Nucleotide binding domains) the model is judged as 'true'.
  • The families in the Membrane all-alpha superfamily are more like superfamilies and are treated accordingly. If the template and the target belong to different families within the Membrane all-alpha superfamily, the model is judged as 'false'.
  • Limitations

    It is accepted that like any other comparison there are limitations.
  • The classification of the targets into SCOP is provisional and generously provided with the caveat that it may change before the next SCOP release.
  • There is a small chance that an incorrect model will be judged 'true' when built from a template which by chance contains a SCOP domain in the same superfamily as the target, although the model does not use that part of the template. As this is unlikely to happen often by chance, it is not expected to affect the results.
  • Models which are built from more than one template will only have one template (the first listed) considered. Since a binary (true/false) decision is being made, by judging only the first listed template, the result is not affected unless there are other templates listed which would be 'true' when the first is 'false'.
  • Methods using models which make modifications to the backbone of the template will not be as relevant for this comparison.
  • It is not possible in all cases to extract a template at all. This is reflected in the numbers in the first column of the tables.
  • Templates using theoretical models are ignored. Templates which have no record in PDB are ignored. Templates which are old or have been replaced are kept and the classification obtained from old SCOP releases.
  • Results

    Due to the sensitive nature of this I leave the discussion of the results as an exercise for the reader. If you want my opinion, or an explanation of why the results here look different from those evaluated in other ways, then please E-mail me (below).

    Acknowledgements

    Thanks especially to Alexey Murzin for his provisional classification of the CASP targets. Thanks also to Dani Fischer and Leszek Rychlewski for producing the CAFASP data.


    Julian Gough
    Last modified: Thu Dec 19 15:30:55 PST 2002