- Models
The hidden Markov models are available for download in HMMER3 (/models/hmmlib_1.75.gz) format in the
models directory.
Either the /models/model.tab file or the SQL database is needed in addition to the
model library. The model library will be updated with every release of
SCOP. It is also strongly
advised to get the
SCOP
files (cla, des, hie). The family level classification
requires the self hits file (/models/self_hits.tab.gz) in addition to the
above-mentioned SCOP parseable files. There is a description of
how to download, setup and run the models.
- Scripts
There is no longer a wrapper script for running the HMMs, simply use the hmmscan program from HMMER3.
There is an important script which it is very strongly recommended for
parsing the scores and alignments generated by a complete SUPERFAMILY model library search.
It is in /scripts/ass3.pl and is designed to be run on the output from hmmscan (HMMER3).
The output is an '.ass' file which should contain a list of
non-conflicting domain assignments. N.B. to run small numbers of sequences turn file checking off, because one of the checks is for a minimum number of sequences in the FASTA file.
In addition you may run sequences through the script /scripts/superfamily.pl. The superfamily.pl script is a wrapper that calls several other programs that will be responsible for formating the sequences, calling HMMER3, parsing and creating a html formatted version of the output (using /scripts/ass_to_html.pl). You can check how to use the script on the Amazon cloud , for a detailed description of the scripts and the output files.
There are also other useful scripts here.
- Genome assignments
The genome assignments are contained in the SQL database, but may
also be accessed in the /genomes directory in flat file format. The file
(/genomes/ass_date.tab) contains the genome assignments for all genomes. The file
/genomes/genome.tab contains information on the genomes. This directory may not be as up
to date as the web site, if you would like some data which is on the web-site but not yet
in the download directory then e-mail
superfamily@mrc-lmb.cam.ac.uk.
- SUPERFAMILY MySQL database
There is a MySQL dump of the SUPERFAMILY database (in the
/sql/supfam_date.sql.gz file, where the appropriate date must be substituted). This contains
all of the genome assignments and associated information. It is recommended that this be
downloaded and installed if the genome assignments are to be used. There is a description of
how to download, install and query the database.
Each of the database tables are in turn described
here, and an entity relationship model
diagram is included.
- Seed sequences of SUPERFAMILY models
The SCOP domains used as seed sequences to build the SUPERFAMILY
models are available for download. They have been filtered to different levels of
redundancy, and can be found in the /sequences directory.
- EC2 AWS cloud image
The SUPERFAMILY pipeline for analysing a FASTA format file of protein sequences is available pre-installed on an image suitable for the Amazon EC2 AWS cloud computing facility. All you need is an account with Amazon and you can upload your sequences and run our pipeline directly using one command. This is significantly easier than downloading and installing the SUPERFAMIY package. After registration you will be given access to the image, and there are also instructions on how to use the image on the Amazon cloud.
The genome assignments and MySQL database dump are updated weekly.