SUPERFAMILY 1.75 HMM library and genome assignments server

SUPERFAMILY 2 can be accessed from Please contact us if you experience any problems.

Recent news

Recent changes and updates to SUPERFAMILY.

SUPERFAMILY twitter feedFollow us on twitter.

29th November 2012 SUPERFAMILY news on TWITTER:
From now on please refer to twitter in order to stay up to date with our news. Follow us @SUPERFAMILY.
2nd August 2011 SUPERFAMILY online user survey:
We are always looking for ways to improve the SUPERFAMILY resource and now is your chance to help. We have just launched our first online user survey to get your feedback. It will take just a few minutes to fill out and it will help shape the future of SUPERFAMILY. To take the survey click here.
1st August 2011 SUPERFAMILY is recruiting:
We are looking for a post-doctoral research associate to help develop and improve existing features of SUPERFAMILY. For more details click here.
5th July 2011 SUPERFAMILY newsfeed is back!:
Sorry that the news feed has been so quiet for so long but now we are back! We have been busy and SUPERFAMILY has many new features which are summarised below:
  • Ancestral Nodes: For example the ancestral node for all Eukaryota can be seen here.
  • Updated to SCOP 1.75: Details of this release can be found on the SCOP website.
  • HMMER 3: We are now using HMMER 3 for model scoring.
  • MySQL schema restructuring: The details of the new schema can be found here.
  • Over 1700 Genomes: The genomes can be viewed here.
  • GO/Phenotype Ontologies: All domains and supra-domains are now annotated with GO and Phenotype information.

20th May 2010 Loaded domain assignments for 120 bacteria and archaea:

Highlights include:
1) First genome from the Veillonella genus: Veillonella parvula DSM 2008
2) Gardnerella vaginalis 409-05 which can cause bacterial vaginosis
3) Conexibacter woesei DSM 14684 named after Carl Woese
4) Fibrobacter succinogenes ssp. succinogenes S85 which is present in the rumen of cattle
5) first genome from the Streptobacillus genus: Streptobacillus moniliformis DSM 12112 which is associated with Haverhill fever
6) first two genomes from Synergistetes phylum: Thermanaerovibrio acidaminovorans DSM 6589 and Aminobacterium colombiense DSM 12261 which have been implicated in periodontal disease
7) Hydrogenobacter thermophilus TK-6 which has the deepest phylogenetic branching point among the bacteria
8) First genome from Ferroglobus genus: Ferroglobus placidus DSM 10642 which is the first hyperthermophile discovered to grow anaerobically (without oxygen)
9) first genome from the Haloferax genus: Haloferax volcanii DS2 which exists in extreme saline conditions

11th May 2010 Added 9 new eukaryotic genomes:

1 plant, 4 fungi, 2 amoebozoa, 1 apicomplexan and 1 euglenozoa (plus 1 partially sequenced euglenozoa).

Prunus persica (peach)
Arthroderma benhamiae CBS 112371
Trichophyton verrucosum HKI 0517
Candida dubliniensis CD36
Encephalitozoon intestinalis (second microsporidian genome)
Entamoeba dispar
Entamoeba invadens
Neospora caninum Nc-Liverpool
Leishmania mexicana
Trypanosoma vivax (partial sequencing)

22nd Mar 2010 Twitter updates for Feb/Mar:

21st Jan 2010 Twitter updates for Dec/Jan:

13th Jan 2010 Added links to PDBeMotif:
For example, the CheY-like superfamily.
Six types of links have been added:
  • PDB entries list for given superfamily
  • Ligand binding statistics (association of small molecules and their fragments from the PDBe fragments library with the superfamily)
  • Nucleic-acid binding statistics (association of nucleotides and their fragments from the PDBe fragments library with the superfamily)
  • Enzymes covered by the superfamily
  • Occurrence of secondary structure elements like helices, strands and loops in the superfamily
  • Occurrence of small 3D structural motifs, beta-turn like, in the superfamily

17th Dec 2009 Spiricoil website released:
  • The Spiricoil website has finally gone live!
  • Genome annotation of coiled coil proteins .. the Coilome
  • Oligomeric state prediction
  • Coiled coil regions and alignments predicted on the sequences
  • 3D homology models

23rd Nov 2009 Twitter updates for Oct/Nov:
  • Annotation loaded for draft Cassava sequence Tip of the hat to @doe_jgi
  • UniProt describe their approach to complete proteomes Proteomes for 1,428 organisms (includes viruses)
  • Further improvements to search: added stopwords, paging corrections and removed duplicate results Result ranking needs further work
  • Added another 7 eukaryotic genomes Including sequences for 2 Penicillium species from @fungalgenomes
  • Added domain assignments for pig (Sus scrofa) and marmoset (Callithrix jacchus) genomes from Ensembl
  • Added over 150 bacterial genomes from NCBI More than 1,200 organisms in SUPERFAMILY Pig and Marmoset comming soon

20th Oct 2009 9 new Eukaryotic Genomes:
2 animals from Ensembl:
Sus scrofa (Pig) and
Callithrix jacchus (Marmoset).

3 fungi and a diatom from the JGI:
Pleurotus ostreatus the oyster mushroom [],
Sporotrichum thermophile proficient decomposer of cellulose [],
Thielavia terrestris fungal thermophile [] and
Fragilariopsis cylindrus Arctic and Antartic diatom [].

Protein sequences for 2 Penicillium species from
Penicillium chrysogenum source of penicillin [] and
Penicillium marneffei [].

Fungal protein sequences from the NCBI:
Pichia pastoris frequently used as an expression system for the production of proteins [].

16th Oct 2009 Over 1,200 organisms included in SUPERFAMILY:

The recent addition of more than 150 bacterial genomes, means there are over 1,200 organisms in the SUPERFAMILY database.

Notable new organisms include:
i) The first bacteria from the Thaumarchaeota taxonomic phylum, Nitrosopumilus maritimus SCM1, which is an ammonia-oxidizing archaeon. Study of this micro-organism could aid the understanding of nitrification in soil and marine environments. See also description from Joint Genome Institute.
ii) The first bacteria from the Gemmatimonadetes taxonomic phylum, Gemmatimonas aurantiaca T-27, which is an anaerobic-aerobic bacterium isolated from sewage waste in Japan.
iii) Six Sulfolobus islandicus species, from 3 locations, were sequenced and used to analyse PubMed"biogeographical structure of the pan-genome of this species":

21st Sep 2009 Twitter updates for Aug/Sep:

22nd July 2009 Twitter updates for June/July:

29th June 2009 Added 5 genome and superfamily shortcut searches to comparative genomics page:

There are quite a few specialised comparative genomics functions on the SUPERFAMILY site, but they are all deeply embedded in the web site hierarchy. To help rectify this problem we have added 5 genome and superfamily search options.

So, it should no longer be necessary to browse through over 1,000 genomes or 1,800 superfamilies. We hope these shortcut searches help to make the comparative genomics functionality on the web site more usable.

12th June 2009 Monthly twitter roundup:

27th May 2009 SUPERFAMILY 1.73 major update:

The downloadable scripts, models, and genome data have been updated. New features have been added.

  • the website has been re-organised and we hope you find the interface improved
  • we now have a twitter feed

12th May 2009 SUPERFAMILY job vacancy:

Job vacancy: Post-doctoral position (Research Associate) to work on SUPERFAMILY and related research. Official application procedure. Please contact Julian Gough (homepage email) for further information before applying.

30th Apr 2009 Added 1.73 domain assignments for UniProt:

Domain assignments for the major 15.0 release of UniProt have been loaded into the database and onto the web site. They will be available for download, from the ftp site, on Monday 4th May. The percentage of sequences with one, or more, domain assignment is 60 %.

6th Mar 2009 Over 1,000 organisms included in SUPERFAMILY:

The recent addition of several hundred bacterial genomes and 15 fungal strains, means there are over 1,000 organisms in the SUPERFAMILY database.

Notable new organisms include:
i) The first bacteria from the Nitrospirae taxonomic phylum, Thermodesulfovibrio yellowstonii DSM 11347, which is a thermophilic bacterium isolated from a thermal vent in Yellowstone Lake, Wyoming, USA.
ii) The second endosymbiont of cellulose-degrading termite gut protozoa, Elusimicrobium minutum Pei191.
iii) Three Histoplama capsulatum strains
Histoplasma capsulatum G186AR,
Histoplasma capsulatum H143,
Histoplasma capsulatum H88,
from the Fungal Genomes Initiative, which are closely related to the first sequenced Histoplasma capsulatum class NAmI strain WU24 genome. Histoplasma capsulatum is a soil-borne fungus that causes histoplasmosis - a disease of humans, dogs and cats.
16th Feb 2009 Two new Neurospora genomes:
Neurospora tetrasperma and Neurospora discreta FGSC 8579.
These two species are closely related to the Neurospora crassa OR74A model Sordariomycete fungi.
Release announcement from Hypal Tip - fungal genomes blog.
31st Jan 2009 Integrated 77 new prokaryotic genomes. Highlights include:

The first 2 bacteria from the Dictyoglomus phylum:
Dictyoglomus turgidum DSM 6724 and
Dictyoglomus thermophilum H-6-12.

The sequence of the first complete genome sequence of a termite gut symbiont - an uncultured bacterium named Rs-D17 belonging to the candidate phylum Termite Group 1 (TG1): uncultured Termite group 1 bacterium phylotype Rs-D17.
Termites, plus prokaryotic organisms and protists living in the termite gut, have been found to produce enzymes involved in cellulose digestion. The bacterial colonies found in the termite gut produce large quantities of hydrogen as a byproduct of cellulose digestion. A better understanding of these pathways could significantly aid research into renewable sources of energy.

30th Dec 2008 New article describing SUPERFAMILY, updated superfamily functional annotation, new eukaryotic genomes:

28th Nov 2008 Loaded 1.73 domain assignments for 4 new nematode genomes:
Pristionchus pacificus,
Caenorhabditis japonica,
Caenorhabditis brenneri and
Caenorhabditis remanei.

21st Nov 2008 Integrated 6 new animal genomes:
Tarsius syrichta a primate [ Philippine tarsier],
Dipodomys ordii a rodent [ Kangaroo rat],
Vicugna pacos a camelid [ Alpaca],
Tursiops truncatus a cetacean - descendant of land-living mammals [ Bottlenosed dolphin],
Pteropus vampyrus also known as the megabat [ Large flying fox],
Procavia capensis arguably the closest living relative of the elephant [ Cape rock hyrax].
These genomes are part of the Mammalian Genome Project funded by the NIH (National Institute of Health), which will sequence 24 mammals to low coverage (typically 2X). Protein predictions from Ensembl.
And updated the guinea pig genome Cavia porcellus.
15th Oct 2008 Added the ability to browse through the SCOP hierarchy to superfamilies and families.
Browse SCOP hierarchy in SUPERFAMILY Starting at the top of the SCOP hierarchy, one can browse through the classes and folds to superfamilies and families of interest. These pages contain the SCOP entries and where available expert annotation from InterPro (including Gene Ontology terms) and functional annotation from Christine Vogel. At the bottom of the superfamily and family pages are a number of links to other relevant pages elsewhere on this site.

10th Oct 2008 Martin Madera has published a paper describing PRC in Bioinformatics.
PRC is a stand-alone program for scoring and aligning profile hidden Markov models (profile HMMs) of protein families. PRC can read models produced by SAM and HMMER, two popular profile HMM packages, as well as PSI-BLAST checkpoint files. To cite PRC:

Profile Comparer (PRC): a program for scoring and aligning profile hidden Markov models

Madera M

Bioinformatics 2008 Nov 15;24(22):2630-1.

Abstract [ PubMed ]   Full text [ HTML · PDF ]

29th Sept 2008
A new HMM library based on SCOP 1.73 is now available for download from the ftp site. The 1.73 model library consists of 13,920 models representing 1,776 superfamilies.
Genome assignments will be added to a beta web site as they become available. We will first add assignments for the main model organisms, followed by the most important animal genomes, then the remaining eukaryotes and finally the bacteria and archaea. When all genomes have been updated the beta site will become the live site.
Update: The beta web site is now the live site.

13th Sept 2008 Integrated Phylogenetic trees:
Phylogenetic trees

Genome combinations or specific clades can be displayed as if individual trees had been produced. The data used is extracted from a single large tree generated from a presence/absence matrix using protein domain architecture data for all genomes in SUPERFAMILY. The PAUP software is used to produce a single, large tree topology using both neighbour joining or heuristic parsimony methods.

12th Sept 2008 Integrated similar domain architectures tool:
We have added a tool to find functionally similar proteins. Our approach compares the domain architecture of interest with all the other domain architectures in the SUPERFAMILY database. The 10 architectures which are most similar to the architecture of interest are selected for display. Documentation describing the similarity function used to find domain architectures with similar genomic distribution. Links to the similar domain architectures can be found on any of the gene pages or the domain combination pages.
1st Sept 2008 Major update to TaxViz taxonomic distribution of domains tool:
Over 200 genome sequences for "model" organisms have been added since TaxViz was initially integrated into the SUPERFAMILY web site. This resulted in some TaxViz pages, being unusable because they contained too many genomes. We have corrected this problem by adding additional "subkingdom" taxonomic groups for the largest kingdoms: metazoa (animals), euryarchaeota, proteobacteria, firmicutes and actinobacteria.
30th Aug 2008 New fungal genomes: added domain predictions for 5 new fungal genomes:
Schizosaccharomyces octosporus yFS286, Verticillium dahliae VdLs.17, Verticillium albo-atrum VaMs.102, Schizosaccharomyces pombe 972h- and Aspergillus fumigatus A1163.
20th Aug 2008 Integrated domain assignments for the Dictyostelium purpureum genome from the JGI.
The dictyostelids are a group of cellular slime molds, or social amoebae, which belong to the amoebozoa supergroup of eukaryotes that form a sister clade to the fungi and animals. Under normal growth conditions the dictyostelids take the individual amoeba form, but under starvation they from multicellular organisms. Further details.
18th Aug 2008 Added domain assignments for over 50 bacterial genomes from the NCBI.
Highlights include, the first genome to be sequenced from a novel bacterial phylum, comprising endosymbionts of cellulose-degrading termite gut protozoa (Endomicrobia) Elusimicrobium minutum Pei191, new genomes from the Verrucomicrobia phyla Akkermansia muciniphila ATCC BAA-835, Methylacidiphilum infernorum V4 and the Aquificae phyla Sulfurihydrogenibium sp. YO3AOP1, Hydrogenobaculum sp. Y04AAS1.
8th Aug 2008 Integrated domain assignments for new green algae Ostreococcus RCC809.
Note: There does not appear to be an associated NCBI taxonomy identifier for this genome. Using the Ostreococcus genus identifier for now.
Update: NCBI taxonomy identifier now available for this genome and integrated.
30th July 2008 Integrated Christine Vogel's functional annotation.
Christine annotated domain superfamilies with respect to their usual role in a protein, in a particular pathway or in the cell/organism. She prepared a scheme of 50 detailed function categories which map to 7 more general function categories. For example, C2H2 and C2HC zinc fingers superfamily and Globin-like superfamily.
11th July 2008 New documentation:

How to download, install and use the SUPERFAMILY database. A description of how to download the MySQL database dump, install it and query it. Each of the database tables are described and a diagram showing the relationships between the database tables is included.

3rd July 2008 Domain assignments for 2 new fungal genomes from the JGI:
Trichoderma atroviride and Cochliobolus heterostrophus. Trichoderma atroviride is best known for its biocontrol capabilities against a range of phytopathogenic fungi, which are pests of hundreds of plant crops. Trichoderma atroviride has caused major crop losses in the past.
23rd June 2008 Loaded domain assignments for the TargetDB sequences.
TargetDB is a structural genomics target registration database, which provides status and tracking information on the progress of the production and solutions of 3D protein structures. TargetDB contains over 175,000 sequences from 25 contributing sites.
20th June 2008 Loaded domain assignments for the microalgae Chlorella sp. NC64A, which is a model system for studying DNA virus/algal interactions.
19th June 2008 New documentation:

How to download, install and run the SUPERFAMILY hidden Markov models. A description of how to download the hidden Markov models (HMMs) used to generate the SUPERFAMILY domain assignments. The installation of the scripts required to run the HMMs is explained.

2nd June 2008 Added 10 Eukaryotic genomes, and updated 12 drosophilid genomes.
Including several newly sequenced fungal strains such as the Chytridiomycota Batrachochytrium dendrobatidis JAM81, early releases of the disease vector Ixodes scapularis (tick) and the livestock pathogen Trypanosoma congolense.
12th May 2008 Added the transgenic papaya (Carica papaya) genome, and over 50 prokaryote genomes.
Highlights among the prokaryote genomes include the first genome from the Verrucomicrobia order (Opitutus terrae) of bacteria, and the first genome from the Korarchaeota order (Candidatus Korarchaeum cryptofilum) of archaea.
30th Apr 2008 Added the phytoplankton Emiliania huxleyi, which is of interest because of it's production of polyketides with antimicrobial, antifungal, antiparasitic, antitumor and agrochemical properties. Updated the beetle Tribolium castaneum assignments as analysis of the genome sequence recently became available [PubMed]. Updated the Schizosaccharomyces pombe genome for a fungi researcher.
28th Apr 2008 Modified taxonomic position of the Monosiga brevicollis, Dictyostelium discoideum and Entamoeba histolytica eukaryotic genomes.
Monosiga brevicollis now occurs between the metazoa and fungi [PubMed]. Both Dictyostelium discoideum [PubMed] and Entamoeba histolytica [PubMed] now occur between the fungi and remaining eukaryotes.
22nd Apr 2008 Loaded domain assignment results for viral sequences from the NCBI.
15th Apr 2008 Integrated InterPro abstracts and Gene Ontology (GO) terms.
For example: Cytochrome c, Mitochondrial carrier, Sigma3 and sigma4 domains of RNA polymerase sigma factors.
InterPro have added abstracts for 1,052 superfamilies, and 763 superfamilies have some gene ontology annotation.
11th Apr 2008 Loaded 2 new early release plant genomes: Glycine max (Soybean) and Zea mays (Maize).
8th Apr 2008 Major update of all Ensembl genomes, including new genomes: Horse and Orangutan.
31st Mar 2008 Added new plant genome from the JGI: Sorghum bicolor.
18th Feb 2008 Added 2 new algae genomes from the JGI:
Micromonas sp. RCC299, Micromonas sp. CCMP490.
23rd Jan 2008 Added 3 fungal genomes, 59 bacteria and updated UniProt.
10th Jan 2008 Added 1 plant and 7 fungal genomes:
Selaginella moellendorffii (Spikemoss), Vanderwaltozyma polyspora, Podospora anserina, Trichoderma virens, Saccharomyces cerevisiae YJM789, Saccharomyces cerevisiae RM11-1a, Cryptococcus neoformans var. grubii H99, Cryptococcus neoformans B-3501A.
18th Dec 2007 Updated the mouse genome and added 2 new animal genomes:
Microcebus murinus (mouse lemur), Ochotona princeps (American pika).
13th Dec 2007 Web site re-design goes live. Please report any inconsistencies or errors to
6th Dec 2007 Post-doc postion to work with Julian Gough on SUPERFAMILY available. Enquiries to Julian Gough.
Update: position has been filled.
12th Oct 2007 Added 10 new Eukaryotic genomes:
Aureococcus anophagefferens, Giardia lamblia, Helobdella robusta, Capitella sp. I, Nasonia vitripennis, Trichoplax adhaerens, Vitis vinifera, Toxoplasma gondii, Xenopus laevis, Mycosphaerella fijiensis.
1st Oct 2007 Completed inclusion of 200 new, and 100 updated, prokaryotic genomes.
7th Sept 2007 Exciting new tool for the visualisation of domains across genomes:

On every page that lists the number of domains in each genome for a given superfamily (or family), there is a new link to a tool called TaxViz. TaxViz provides a graphic representation of the occurence of a domain across all the taxonomic kingdoms included in SUPERFAMILY.

21st Aug 2007 Family level data and analysis has been extended to include: pages listing family assignments for each genome, and unusual (over- and under-represented) families within each genome.
9th July 2007 Added more new low coverage vertebrate genomes from Ensembl, fungal genomes from FGI and basal metazoa from JGI. Highlights include the sea anemone Nematostella vectensis, crustacean Daphnia pulex , moss Physcomitrella patens subsp. patens and the colony forming algal species Volvox carteri f. nagariensis.
11th May 2007 Added 4 new low coverage vertebrate genomes from Ensembl:
Cavia porcellus, Myotis lucifugus, Spermophilus tridecemlineatus, Otolemur garnettii.
4th Apr 2007 Added 7 new low coverage vertebrate genomes from Ensembl:
Dasypus novemcinctus, Echinops telfairi, Erinaceus europaeus, Felis catus, Loxodonta africana, Oryctolagus cuniculus, Tupaia belangeri.
12th Mar 2007 Updated to (43.36e) Ensembl homo sapiens genome.
19th Feb 2007 InterPro update to SUPERFAMILY 1.69.
4th Jan 2007 Added 11 AAA drosophilid genomes to the web site and database:
Drosophila ananassae, Drosophila persimilis, Drosophila virilis, Drosophila simulans, Drosophila mojavensis, Drosophila yakuba, Drosophila sechellia, Drosophila grimshawi, Drosophila erecta, Drosophila pseudoobscura, Drosophila willistoni.
11th Dec 2006 Moved SUPERFAMILY website and database to new server.