Description of the SUPERFAMILY Database and Web Site
SUPERFAMILY is a database of structural and functional annotation for all proteins and genomes.
The SUPERFAMILY annotation is based on a collection of hidden Markov models, which represent
structural protein domains at the
SCOP
superfamily level. A superfamily groups together domains which have an evolutionary relationship. The annotation
is produced by scanning protein sequences from over
2,478 completely sequenced genomes
against the hidden Markov models.
For each protein you can:
Submit sequences for SCOP classification
View domain organisation, sequence alignments and protein sequence details
For each genome you can:
Examine superfamily assignments, phylogenetic trees, domain organisation lists and networks
Check for over- and under-represented superfamilies within a genome
For each superfamily you can:
Inspect SCOP classification, functional annotation, Gene Ontology annotation, InterPro abstract and genome assignments
Explore taxonomic distribution of a superfamily across the tree of life
All annotation, models and the database dump are freely available for download to everyone.
SUPERFAMILY is a member of the InterPro consortium of protein annotation databases, and has been integrated into the Ensembl eukaryotic genome project and The Arabidopsis Information Resource. To date, the SUPERFAMILY publications have been cited over 1,000 times. SUPERFAMILY has been used in structural, functional, evolutionary and phylogenetic research projects.
Server Purpose
The purpose of this server is to provide structural (and hence implied functional) assignments to protein
sequences primarily at the SCOP superfamily level. A superfamily contains all proteins for which there is structural evidence of a common evolutionary
ancestor. What this service offers is sophisticated and expertly chosen remote homology detection. What it
does not offer is an improvement in speed or assignment of superfamilies not of known structure.
There is a facility to compute assignments for your own DNA or protein sequences, and there is access to genome
assignments and to multiple sequence alignments of SCOP superfamilies. If you
have an interest in running large numbers of sequences, then please don't hesitate to
contact us via superfamily@mrc-lmb.cam.ac.uk.
The web site includes services such as domain architectures and alignment details for all protein assignments, searchable domain combinations, domain occurrence network visualization, detection of over- or under-represented superfamilies for a given genome by comparison with other genomes, assignment of manually submitted sequences and keyword searches.
Sequence Search Description
The sequence search method uses a library (covering all proteins of known structure) consisting of
1962 SCOP 1.75 superfamilies from classes a to g. Each superfamily is represented by a
group of hidden Markov models. Your query sequences will be assigned
e-value scores for all models, and the significant ones will be returned. Each sequence may well hit a
superfamily more than once as there are several overlapping models for each superfamily, however it is the
hit to the superfamily which is meaningful. Each model is created from a seed sequence which is aligned to
many superfamily homologues. The model is built from the alignment (please see the SAM website for a detailed explanation).
A hit to a model is not a hit to the seed but is a hit to the superfamily which the model
represents. You may view sequences aligned to the models which represent a
view of the superfamily although it may be biased towards the seed. You may also see the genome
assignments for each superfamily or view alignments of the genome
sequences.
The SUPERFAMILY server is based upon release 1.75 of the
SCOP structural classification of
proteins, the corresponding sequences from ASTRAL, and the SAM and HMMER3 hidden markov model software packages.
Comparative Genomics Tools
The SUPERFAMILY web site provides a number of comparative genomics tools for the analysis of
superfamily, and family, domains from across the tree of life. These tools include: lists of unusual
(over- and under-represented) superfamilies and families, adjacent domain pair lists and graphs, unique domain pairs,
domain combinations, domain architecture co-occurrence networks and domain distribution across taxonomic kingdoms for
each organism. A detailed description of what these tools can do, and how to use them can be found on the
comparative genomics page.
Downloads
Downloads are instantly available upon application for a free license. The model library, genome assignments and some software are available. Genome assignments are updated weekly. There is
a low traffic announcement mailing list for notification of updates/changes.
Citation
Groups using results derived from this project for publication are asked to cite:
A detailed list of the SUPERFAMILY publications can be found here.