SUPERFAMILY 1.75 HMM library and genome assignments server

SUPERFAMILY 2 can be accessed from supfam.org. Please contact us if you experience any problems.

Description of the SUPERFAMILY Database and Web Site

SUPERFAMILY is a database of structural and functional annotation for all proteins and genomes.

The SUPERFAMILY annotation is based on a collection of hidden Markov models, which represent structural protein domains at the SCOP superfamily level. A superfamily groups together domains which have an evolutionary relationship. The annotation is produced by scanning protein sequences from over 2,478 completely sequenced genomes against the hidden Markov models.

For each protein you can:
    Submit sequences for SCOP classification
    View domain organisation, sequence alignments and protein sequence details

For each genome you can:
    Examine superfamily assignments, phylogenetic trees, domain organisation lists and networks
    Check for over- and under-represented superfamilies within a genome

For each superfamily you can:
    Inspect SCOP classification, functional annotation, Gene Ontology annotation, InterPro abstract and genome assignments
    Explore taxonomic distribution of a superfamily across the tree of life

All annotation, models and the database dump are freely available for download to everyone.

SUPERFAMILY is a member of the InterPro consortium of protein annotation databases, and has been integrated into the Ensembl eukaryotic genome project and The Arabidopsis Information Resource. To date, the SUPERFAMILY publications have been cited over 1,000 times. SUPERFAMILY has been used in structural, functional, evolutionary and phylogenetic research projects.

Server Purpose

The purpose of this server is to provide structural (and hence implied functional) assignments to protein sequences primarily at the SCOP superfamily level. A superfamily contains all proteins for which there is structural evidence of a common evolutionary ancestor. What this service offers is sophisticated and expertly chosen remote homology detection. What it does not offer is an improvement in speed or assignment of superfamilies not of known structure.

There is a facility to compute assignments for your own DNA or protein sequences, and there is access to genome assignments and to multiple sequence alignments of SCOP superfamilies. If you have an interest in running large numbers of sequences, then please don't hesitate to contact us via superfamily@mrc-lmb.cam.ac.uk.

The web site includes services such as domain architectures and alignment details for all protein assignments, searchable domain combinations, domain occurrence network visualization, detection of over- or under-represented superfamilies for a given genome by comparison with other genomes, assignment of manually submitted sequences and keyword searches.

Sequence Search Description

The sequence search method uses a library (covering all proteins of known structure) consisting of 1962 SCOP 1.75 superfamilies from classes a to g. Each superfamily is represented by a group of hidden Markov models. Your query sequences will be assigned e-value scores for all models, and the significant ones will be returned. Each sequence may well hit a superfamily more than once as there are several overlapping models for each superfamily, however it is the hit to the superfamily which is meaningful. Each model is created from a seed sequence which is aligned to many superfamily homologues. The model is built from the alignment (please see the SAM website for a detailed explanation). A hit to a model is not a hit to the seed but is a hit to the superfamily which the model represents. You may view sequences aligned to the models which represent a view of the superfamily although it may be biased towards the seed.  You may also see the genome assignments for each superfamily or view alignments of the genome sequences.

The SUPERFAMILY server is based upon release 1.75 of the SCOP structural classification of proteins, the corresponding sequences from ASTRAL, and the SAM and HMMER3 hidden markov model software packages.

Comparative Genomics Tools

The SUPERFAMILY web site provides a number of comparative genomics tools for the analysis of superfamily, and family, domains from across the tree of life. These tools include: lists of unusual (over- and under-represented) superfamilies and families, adjacent domain pair lists and graphs, unique domain pairs, domain combinations, domain architecture co-occurrence networks and domain distribution across taxonomic kingdoms for each organism. A detailed description of what these tools can do, and how to use them can be found on the comparative genomics page.

Downloads

Downloads are instantly available upon application for a free license. The model library, genome assignments and some software are available. Genome assignments are updated weekly. There is a low traffic announcement mailing list for notification of updates/changes.

Citation

Groups using results derived from this project for publication are asked to cite:

Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure.

Gough J, Karplus K, Hughey R, Chothia C.

J Mol Biol. 2001 Nov 2;313(4):903-19.

Abstract [ PubMed ]   Full text [ HTML · PDF ]

A detailed list of the SUPERFAMILY publications can be found here.