SUPERFAMILY 2 can be accessed from supfam.org. Please contact us if you experience any problems.

Home > PO annotations for Supra-domains

Supra-domain2GO Inference and Supra-domain Phenotype Ontology (SPPO)

Jump to [ Top · SP2PO · SPPO · Data availability ]

This document explains the details behind PO annotations of supra-domains. The Structural Classification of Proteins (SCOP) database (Andreeva, et al., 2008) defines and classifies domains as structural units and as the smallest unit of evolution. Like GO, phenotypic ontologies (PO, as highlighted below) have been developed to classify and organize phenotypic information related to the Human/model organisms from the very general at the top to more specific terms in the DAG.

Disease Ontology (DO) is a standardized ontology for human disease by semantically integrates disease and medical vocabularies through extensive cross mapping of DO terms to MeSH, ICD, NCI’s thesaurus, SNOMED and OMIM (Schriml, et al., 2009). Also available are their mappings onto human genome (Osborne, et al., 2009).
Human Phenotype Ontology (HP) captures phenotypic abnormalities that are described in OMIM, along with the corresponding disease-causing genes (Robinson, et al., 2008). It includes three complementary biological concepts: Mode of Inheritance (MI), ONset and clinical course (ON), and Phenotypic Abnormality (PA).
Mammalian/Mouse Phenotype Ontology (MP) describes phenotypes of the mouse after a specific gene is genetically disrupted (Smith, et al., 2009). Using it, Mouse Genome Informatics (MGI) provides high-coverate gene-level phenotypes for the mouse.
Worm Phenotype Ontology (WP) classifies and organizes phenotype descriptions for C. elegans and other nematodes (Schindelman, et al., 2011). Using it, WormBase provides primary resource for phenotype annotations for C. elegans.
Yeast Phenotype Ontology (YP) is the major contributor to the ‘Ascomycete phenotype ontology’. Using it, Saccharomyces Genome Database (SGD) provides single mutant phenotypes for every gene in the yeast genome (Engel, et al., 2010).
Fly Phenotype Ontology (FP) refers to FlyBase controlled vocabulary. Specifically, a structured controlled vocabulary is used for the annotation of alleles (for their mutagen etc) in FlyBase (Grumbling, et al., 2006).
Fly Anatomy Ontology (FA) is a structured controlled vocabulary of the anatomy of Drosophila melanogaster, used for the description of phenotypes and where a gene is expressed (Grumbling, et al., 2006).
Zebrafish Anatomy Ontology (ZA) displays anatomical terms of the zebrafish using standard anatomical nomenclature, together with affected genes (Bradford, et al., 2011).
Xenopus Anatomy Ontology (XA) represents the lineage of tissues and the timing of development for frogs (Xenopus laevis and Xenopus tropicalis). It is used to annotate Xenopus gene expression patterns and mutant and morphant phenotypes (Bowes, et al., 2009).
Arabidopsis Plant Ontology (AP) is a major contributor to Plant Ontology which describes plant ANatomical and morphological structures (PAN) and growth and DEvelopmental stages (PDE). The Arabidopsis Information Resource (TAIR) provides arabidopsis plant ontology annotations for the model higher plant Arabidopsis thaliana (Ilic, et al., 2006; Pujar, et al., 2006).

To incorporate non-OBO-formated ontology, the approach is also applicable in other ontologies with fixed-length or much-simplified hierarchy:

Enzyme Commission (EC) is a resource focused on enzyme nomenclature, which is a system of naming enzymes (protein catalysts) with Cross-references to UniProts (Fleischmann et al., 2004). It uses four-digit EC number to define the reaction catalysed. The first three digits are to define the reaction catalysed and the fourth for a unique identifier (serial number).
DrugBank ATC code (DB) classifies at five different levels according to the organ or system (1st level, anatomical main group) on which they act and their therapeutic (2nd level, therapeutic subgroup), pharmacological (3rd level, pharmacological subgroup) and chemical properties (4th level, chemical subgroup; 5th level, chemical substance). Only drugs in DrugBank and with the Anatomical Therapeutic Chemical (ATC) classification system are considered (Knox et al., 2011).
UniProtKB KeyWords (KW) controlled vocabulary, providing a summary of the entry content and are used to index UniProtKB/Swiss-Prot entries based on 10 categories (the category "Technical term" being excluded here). Each keyword is attributed manually to UniProtKB/Swiss-Prot entries and automatically to UniProtKB/TrEMBL entries (according to specific annotation rules) (Bairoch et al., 2005).
UniProtKB UniPathway (UP) a fully manually curated resource for the representation and annotation of metabolic pathways, being used as controlled vocabulary for pathway annotation in UniProtKB (Morgat et al., 2012).

Together with genome-wide domain assignments for proteins in the SUPERFAMILY database (Gough, 2006), we have made statistical inference for detecting PO ontology relatedness to structural domains (de Lima Morais, et al., 2011). Although domain-centric phenotype annotations hold great promise in describing independent domains, most domains themselves may not just work alone. In multi-domain proteins, they may be combined together to form distinct domain architectures. The recombination of the existing domains is considered as one of major driving forces for phenotypic diversificaation. In particular, certain pair-wise domain combinations (or triplets or more) may occur in diverse domain architectures and thus can be viewed as larger evolutionary units (termed supra-domains). Although supra-domains are clearly of evolutionary importance, their functions/phenotypes remain uncharacterized. In practice, they are far more difficult than individual domains to curate by manually examining the functions of multi-domain proteins they reside in. To facilitate the understanding of how domain combinations contribute to function/phenotypic diversifications, we here extend the utility of the previous framework in capturing PO terms suitable for supra-domains in addition to individual superfamilies (Figures 1-4). At the core of this framework is that, if a PO term tends to annotate individual-domain-containing proteins (or proteins containing a supra-domain), then this term should also confer functional signals for that single-domain (or supra-domain).

The pipeline of building supra-domain PO annotations

Jump to [ Top · SP2PO · SPPO · Data availability ]

The implementation of this framework starts from high-coverage domain architectures and Protein/gene-level PO annotations for the longest transcript (to ensure the one-gene-one-protein mapping is valid, as these phenotypic annotations are gene-orientated rather than protein-based), available respectively from SUPERFAMILY (de Lima Morais, et al., 2011) and the PO annotations of interest (Figure 1). We respect the hierarchical structure of PO, which is organized as a directed acyclic graph (DAG) by viewing an individual term as a node and its relations to parental terms (allowing for multiple parents) as directed edges. Accordingly, two types of inference between a supra-domain (individual superfamily) and a PO term are performed in terms of the root and in terms of direct parental PO (Figure 2). These dual constraints make sure that only the most relevant PO terms are retained.

Figure 1. A general framework for inferring PO annotations for evolutionary SCOP domains and supra-domains using domain architectures and PO annotations for longest transcrips in model organism of interest.

Figure 2. The statistical significance of inference is assessed based on the hypergeometric distribution, generating overall over-representation in terms of the whole annotations (left panel) and relative over-presentation in terms of all direct parents (middle panel). Based on the maximal P-values, statistical significance of SP-PO term associations can be assessed by the method of FDR accounting for multiple hypothesis tests (right panel). SP denotes both of supra-domains and individual superfamilies.

Obtaining supra-domain mammalian phenotype ontology

Jump to [ Top · SP2PO · SPPO · Data availability ]

Based on predicted SP2PO annotations, we have also initialized a trimmed-down version of PO which is the most informative to annotate supr-domains (including individual superfamilies) (Figure 3).

Figure 3. Flowchart of creating supra-domains mammalian phenotype ontology (SPPO) based on information theoretic analysis of SP2PO annotation profiles.

First, we apply information theory to define information content (IC) of a PO term: negative log10-transformation of the frequency of observing SP annotated to that term. For any SP, PO terms annotated to that SP constitute an SP-PO annotation profile in DAG, including direct annotations as well as inherited annotations according to the true-path rule. Considering the nature of dependencies among PO terms (or so-called true-path rule), an SP directly annotated to a specific PO term (termed as direct annotations) should be inheritably annotated to its parental terms (terms as inherited annotations). PO annotations generated above can be considered as direct annotations. The complete PO annotations (direct and inherited) are used to calculate IC for all PO terms. Of note, those PO terms with similar IC can represent a partition of DAG in terms of SP2PO.

Second, given a predefined IC (say 1) as a seed and its corresponding the range (say, [0.75 1.25]), the proposed algorithm starts with initially unmarked all PO terms, and iteratively identifies unmarked PO terms closest to a predefined IC until all PO terms are marked (Figure 4). To make sure that one and only one PO term can be identified per path in DAG, the following constraints should be met: If multiple PO terms with identical IC are identified in the same path, those parental terms are filtered out; once a PO term is identified, all terms in the path in which that term is located will be marked for being immune from further search.

Last, the outputs are those identified PO terms with IC falling in the range. We run the algorithm using each of four seed ICs (i.e., 0.5, 1, 1.5 and 2) to create SPPO, respectively corresponding to PO terms with four levels (least informative, moderately informative, informative, highly informative). In summary, we provide a meta-PO as a proxy for annotating both supra-domains and individual superfamilies.

Figure 4. Illustration of the algorithm how to iteratively create supra-domain mammalian phenotype ontology (SPPO). I). Initially, all PO terms in DAG are unmarked (open circles); II). Identify those unmarked PO terms (filled in pink) with IC closest to a predefined IC (e.g., 1); III). Filter out those parental PO terms from identified PO terms in Step II. IV). Mark PO terms identified as well as all of their ancestors and descendants. V-VI). Continue the Steps II-IV to iteratively identify unmarked PO terms until all PO terms are marked. VII). Output only those identified PO terms with IC falling in the range (e.g., [0.75 1.25]) as SPPO.

Data Availability

Jump to [ Top · SP2PO · SPPO · Data availability ]

In additional to PO Hierarchy for the browsing, we here also provide SP2PO mapping results in two parsable formats (i.e., plain files and mysql tables). The meanings of abbreviations below are explained in the browsable hierarchy.

SP2PO mapping plain files

SP2DO mapping results

Full supre-domains (including individual superfamilies) DO annotations are available in the SP2DO.txt file.
DO terms which are regarded as SPDO (four levels: least informative, moderately informative, informative, and highly informative ) can be found in the SPDO.txt file. Unlike the whole DO hierarchy, those DO terms at different granularity are representative and comprehensive in terms of their relevance to supre-domains (including individual superfamilies). Keep it in mind that SPDO corresponds to only SCOP superfamily level.
We highly recommend users to use these DO terms in SPDO.txt and their annotating supra-domains extracted from SP2DO.txt. They are of poteintal use in comparative functional genomics, particularly in understanding how multi-domain proteins have evolved under functional constraints along the tree of life.

SP2HP mapping results

Full supre-domains (including individual superfamilies) HP annotations are available in the SP2HP.txt file.
HP terms which are regarded as SPHO (four levels: least informative, moderately informative, informative, and highly informative ) can be found in the SPHO.txt file. Unlike the whole HP hierarchy, those HP terms at different granularity are representative and comprehensive in terms of their relevance to supre-domains (including individual superfamilies). Keep it in mind that SPHO corresponds to only SCOP superfamily level.
We highly recommend users to use these HP terms in SPHO.txt and their annotating supra-domains extracted from SP2HP.txt. They are of poteintal use in comparative functional genomics, particularly in understanding how multi-domain proteins have evolved under functional constraints along the tree of life.

SP2MP mapping results

Full supre-domains (including individual superfamilies) MP annotations are available in the SP2MP.txt file.
MP terms which are regarded as SPMP (four levels: least informative, moderately informative, informative, and highly informative ) can be found in the SPMP.txt file. Unlike the whole MP hierarchy, those MP terms at different granularity are representative and comprehensive in terms of their relevance to supre-domains (including individual superfamilies). Keep it in mind that SPMP corresponds to only SCOP superfamily level.
We highly recommend users to use these MP terms in SPMP.txt and their annotating supra-domains extracted from SP2MP.txt. They are of poteintal use in comparative functional genomics, particularly in understanding how multi-domain proteins have evolved under functional constraints along the tree of life.

SP2WP mapping results

Full supre-domains (including individual superfamilies) WP annotations are available in the SP2WP.txt file.
WP terms which are regarded as SPWP (four levels: least informative, moderately informative, informative, and highly informative ) can be found in the SPWP.txt file. Unlike the whole WP hierarchy, those WP terms at different granularity are representative and comprehensive in terms of their relevance to supre-domains (including individual superfamilies). Keep it in mind that SPWP corresponds to only SCOP superfamily level.
We highly recommend users to use these WP terms in SPWP.txt and their annotating supra-domains extracted from SP2WP.txt. They are of poteintal use in comparative functional genomics, particularly in understanding how multi-domain proteins have evolved under functional constraints along the tree of life.

SP2YP mapping results

Full supre-domains (including individual superfamilies) YP annotations are available in the SP2YP.txt file.
YP terms which are regarded as SPYP (four levels: least informative, moderately informative, informative, and highly informative ) can be found in the SPYP.txt file. Unlike the whole YP hierarchy, those YP terms at different granularity are representative and comprehensive in terms of their relevance to supre-domains (including individual superfamilies). Keep it in mind that SPYP corresponds to only SCOP superfamily level.
We highly recommend users to use these YP terms in SPYP.txt and their annotating supra-domains extracted from SP2YP.txt. They are of poteintal use in comparative functional genomics, particularly in understanding how multi-domain proteins have evolved under functional constraints along the tree of life.

SP2FP mapping results

Full supre-domains (including individual superfamilies) FP annotations are available in the SP2FP.txt file.
FP terms which are regarded as SPFP (four levels: least informative, moderately informative, informative, and highly informative ) can be found in the SPFP.txt file. Unlike the whole FP hierarchy, those FP terms at different granularity are representative and comprehensive in terms of their relevance to supre-domains (including individual superfamilies). Keep it in mind that SPFP corresponds to only SCOP superfamily level.
We highly recommend users to use these FP terms in SPFP.txt and their annotating supra-domains extracted from SP2FP.txt. They are of poteintal use in comparative functional genomics, particularly in understanding how multi-domain proteins have evolved under functional constraints along the tree of life.

SP2FA mapping results

Full supre-domains (including individual superfamilies) FA annotations are available in the SP2FA.txt file.
FA terms which are regarded as SPFA (four levels: least informative, moderately informative, informative, and highly informative ) can be found in the SPFA.txt file. Unlike the whole FA hierarchy, those FA terms at different granularity are representative and comprehensive in terms of their relevance to supre-domains (including individual superfamilies). Keep it in mind that SPFA corresponds to only SCOP superfamily level.
We highly recommend users to use these FA terms in SPFA.txt and their annotating supra-domains extracted from SP2FA.txt. They are of poteintal use in comparative functional genomics, particularly in understanding how multi-domain proteins have evolved under functional constraints along the tree of life.

SP2ZA mapping results

Full supre-domains (including individual superfamilies) ZA annotations are available in the SP2ZA.txt file.
ZA terms which are regarded as SPZA (four levels: least informative, moderately informative, informative, and highly informative ) can be found in the SPZA.txt file. Unlike the whole ZA hierarchy, those ZA terms at different granularity are representative and comprehensive in terms of their relevance to supre-domains (including individual superfamilies). Keep it in mind that SPZA corresponds to only SCOP superfamily level.
We highly recommend users to use these ZA terms in SPZA.txt and their annotating supra-domains extracted from SP2ZA.txt. They are of poteintal use in comparative functional genomics, particularly in understanding how multi-domain proteins have evolved under functional constraints along the tree of life.

SP2XA mapping results

Full supre-domains (including individual superfamilies) XA annotations are available in the SP2XA.txt file.
XA terms which are regarded as SPXA (four levels: least informative, moderately informative, informative, and highly informative ) can be found in the SPXA.txt file. Unlike the whole XA hierarchy, those XA terms at different granularity are representative and comprehensive in terms of their relevance to supre-domains (including individual superfamilies). Keep it in mind that SPXA corresponds to only SCOP superfamily level.
We highly recommend users to use these XA terms in SPXA.txt and their annotating supra-domains extracted from SP2XA.txt. They are of poteintal use in comparative functional genomics, particularly in understanding how multi-domain proteins have evolved under functional constraints along the tree of life.

SP2AP mapping results

Full supre-domains (including individual superfamilies) AP annotations are available in the SP2AP.txt file.
AP terms which are regarded as SPAP (four levels: least informative, moderately informative, informative, and highly informative ) can be found in the SPAP.txt file. Unlike the whole AP hierarchy, those AP terms at different granularity are representative and comprehensive in terms of their relevance to supre-domains (including individual superfamilies). Keep it in mind that SPAP corresponds to only SCOP superfamily level.
We highly recommend users to use these AP terms in SPAP.txt and their annotating supra-domains extracted from SP2AP.txt. They are of poteintal use in comparative functional genomics, particularly in understanding how multi-domain proteins have evolved under functional constraints along the tree of life.

SP2EC mapping results

Full supre-domains (including individual superfamilies) EC annotations are available in the SP2EC.txt file.
EC terms which are regarded as SPEC (four levels: least informative, moderately informative, informative, and highly informative ) can be found in the SPEC.txt file. Unlike the whole EC hierarchy, those EC terms at different granularity are representative and comprehensive in terms of their relevance to supre-domains (including individual superfamilies). Keep it in mind that SPEC corresponds to only SCOP superfamily level.
We highly recommend users to use these EC terms in SPEC.txt and their annotating supra-domains extracted from SP2EC.txt. They are of poteintal use in comparative functional genomics, particularly in understanding how multi-domain proteins have evolved under functional constraints along the tree of life.

SP2DB mapping results

Full supre-domains (including individual superfamilies) DB annotations are available in the SP2DB.txt file.
DB terms which are regarded as SPDB (four levels: least informative, moderately informative, informative, and highly informative ) can be found in the SPDB.txt file. Unlike the whole DB hierarchy, those DB terms at different granularity are representative and comprehensive in terms of their relevance to supre-domains (including individual superfamilies). Keep it in mind that SPDB corresponds to only SCOP superfamily level.
We highly recommend users to use these DB terms in SPDB.txt and their annotating supra-domains extracted from SP2DB.txt. They are of poteintal use in comparative functional genomics, particularly in understanding how multi-domain proteins have evolved under functional constraints along the tree of life.

SP2KW mapping results

Full supre-domains (including individual superfamilies) KW annotations are available in the SP2KW.txt file.
KW terms which are regarded as SPKW (four levels: least informative, moderately informative, informative, and highly informative ) can be found in the SPKW.txt file. Unlike the whole KW hierarchy, those KW terms at different granularity are representative and comprehensive in terms of their relevance to supre-domains (including individual superfamilies). Keep it in mind that SPKW corresponds to only SCOP superfamily level.
We highly recommend users to use these KW terms in SPKW.txt and their annotating supra-domains extracted from SP2KW.txt. They are of poteintal use in comparative functional genomics, particularly in understanding how multi-domain proteins have evolved under functional constraints along the tree of life.

SP2UP mapping results

Full supre-domains (including individual superfamilies) UP annotations are available in the SP2UP.txt file.
UP terms which are regarded as SPUP (four levels: least informative, moderately informative, informative, and highly informative ) can be found in the SPUP.txt file. Unlike the whole UP hierarchy, those UP terms at different granularity are representative and comprehensive in terms of their relevance to supre-domains (including individual superfamilies). Keep it in mind that SPUP corresponds to only SCOP superfamily level.
We highly recommend users to use these UP terms in SPUP.txt and their annotating supra-domains extracted from SP2UP.txt. They are of poteintal use in comparative functional genomics, particularly in understanding how multi-domain proteins have evolved under functional constraints along the tree of life.

SP2PO MySQL tables

SP2PO.sql.gz

Domain2PO mapping results

PO_info

    > DESC PO_info;
    +------------+---------------------+------+-----+---------+-------+
    | Field      | Type                | Null | Key | Default | Extra |
    +------------+---------------------+------+-----+---------+-------+
    | obo        | char(2)             | NO   | PRI | NULL    |       |
    | po         | varchar(20)         | NO   | PRI | NULL    |       |
    | namespace  | varchar(50)         | NO   |     | NULL    |       |
    | name       | varchar(255)        | NO   | MUL | NULL    |       |
    | synonym    | text                | YES  |     | NULL    |       |
    | definition | text                | YES  |     | NULL    |       |
    | distance   | tinyint(3) unsigned | NO   |     | NULL    |       |
    +------------+---------------------+------+-----+---------+-------+

The obo column indicates the type of PO. Can be one of 'DO' for 'Disease Ontology', 'HP' for 'Human Phenotype', 'MP' for 'Mouse Phenotype', 'WP' for 'Worm Phenotype', 'YP' for 'Yeast Phenotype', 'FP' for 'Fly Phenotype', 'FA' for 'Fly Anatomy', 'ZA' for 'Zebrafish Anatomy', 'AP' for 'Arabidopsis Plant'.
The po column is the corresponding PO id. It is browsable via PO Hierarchy.
The namespace column can be one of three GO sub-ontologies, otherwise root.
The name column shows the full name of PO terms.
The synonym column is the synonym of PO terms.
The definition column is the definition of PO terms.
The distance column shows the distance of PO terms to the corresponding sub-ontology.

PO_hie

    > DESC PO_hie;
    +----------+---------------------+------+-----+---------+-------+
    | Field    | Type                | Null | Key | Default | Extra |
    +----------+---------------------+------+-----+---------+-------+
    | obo      | char(2)             | NO   | PRI | NULL    |       |
    | parent   | varchar(20)         | NO   | PRI | NULL    |       |
    | child    | varchar(20)         | NO   | PRI | NULL    |       |
    | distance | tinyint(3) unsigned | NO   | PRI | NULL    |       |
    +----------+---------------------+------+-----+---------+-------+

The obo column indicates the type of PO. Can be one of 'DO' for 'Disease Ontology', 'HP' for 'Human Phenotype', 'MP' for 'Mouse Phenotype', 'WP' for 'Worm Phenotype', 'YP' for 'Yeast Phenotype', 'FP' for 'Fly Phenotype', 'FA' for 'Fly Anatomy', 'ZA' for 'Zebrafish Anatomy', 'AP' for 'Arabidopsis Plant'.
The parent column is the parental PO id.
The child column is the child PO id.
The distance column shows the distance of parental PO id to child PO id. 1 for direct parent-child relationships, others indicating the existance of a path between them (reachable but indirect).

PO_mapping_supradomain

    > DESC PO_mapping_supradomain;
    +----------------+---------------------------+------+-----+---------+-------+
    | Field          | Type                      | Null | Key | Default | Extra |
    +----------------+---------------------------+------+-----+---------+-------+
    | supradomain    | text                      | NO   | MUL | NULL    |       |
    | level          | enum('cl','cf','sf','fa') | NO   |     | NULL    |       |
    | obo            | char(2)                   | NO   | MUL | NULL    |       |
    | po             | varchar(20)               | NO   |     | NULL    |       |
    | all_score      | double                    | NO   |     | 1       |       |
    | inherited_from | text                      | YES  |     | NULL    |       |
    +----------------+---------------------------+------+-----+---------+-------+

The supradomain is a comma separated list of the SCOP unique identifier, sunid. It is browsable via SCOP Hierarchy.
The level in the SCOP hierarchy. Can be one of 'cl' for class, 'cf' for fold, 'sf' for superfamily, 'fa' for family.
The obo column indicates the type of PO. Can be one of 'DO' for 'Disease Ontology', 'HP' for 'Human Phenotype', 'MP' for 'Mouse Phenotype', 'WP' for 'Worm Phenotype', 'YP' for 'Yeast Phenotype', 'FP' for 'Fly Phenotype', 'FA' for 'Fly Anatomy', 'ZA' for 'Zebrafish Anatomy', 'AP' for 'Arabidopsis Plant'.
The po column is the corresponding PO id.
The all_score column is the FDR supported by all longest-transcript worm genes/proteins (including multidomain proteins).
The inherited_from column is to mark the status of SP2PO predicted annotations. 1) If it is marked with 'directed' (i.e., 'all_score'<0.001), SP2PO is significantly supported by all longest-transcript worm genes/proteins (including multidomain proteins). 2) If it is a comma separated list of PO id (numeric part; the column 'all_score' is not less than 0.001), SP2PO is inherited from any descentant PO terms (significantly associated) when applying true-path rule in DAG. 3) Empty otherwise. Hence, the lists of SP2PO supported only by all can be obtained by selecting the column 'inherited_from' with NOT EPOTY.

PO_ic_supra

    > DESC PO_ic_supra;
    +---------+---------------------------+------+-----+---------+-------+
    | Field   | Type                      | Null | Key | Default | Extra |
    +---------+---------------------------+------+-----+---------+-------+
    | level   | enum('cl','cf','sf','fa') | NO   | PRI | NULL    |       |
    | obo     | char(2)                   | NO   | PRI | NULL    |       |
    | po      | varchar(20)               | NO   | PRI | NULL    |       |
    | ic      | double                    | YES  |     | NULL    |       |
    | include | tinyint(2)                | YES  | MUL | NULL    |       |
    +---------+---------------------------+------+-----+---------+-------+

The level in the SCOP hierarchy. Can be one of 'cl' for class, 'cf' for fold, 'sf' for superfamily, 'fa' for family.
The obo column indicates the type of PO. Can be one of 'DO' for 'Disease Ontology', 'HP' for 'Human Phenotype', 'MP' for 'Mouse Phenotype', 'WP' for 'Worm Phenotype', 'YP' for 'Yeast Phenotype', 'FP' for 'Fly Phenotype', 'FA' for 'Fly Anatomy', 'ZA' for 'Zebrafish Anatomy', 'AP' for 'Arabidopsis Plant'.
The po column is the corresponding PO id.
The ic column shows the infomration content of the PO term.
The include column indicates whether or not the PO term belongs to the SDPO. If the column is set to '0' then it is not a member of SDPO. Otherwise, '1' for least informative (i.e., the most general), '2' for moderately informative, '3' for informative, '4' for highly informative (i.e., the most specific).

References

Andreeva, A., Howorth, D., Chandonia, J.M., Brenner, S.E., Hubbard, T.J., Chothia, C. and Murzin, A.G. (2008) Data growth and its impact on the SCOP database: new developments, Nucleic Acids Res, 36, D419-425. Abstract [ ]
Bairoch, A., Apweiler, R., et al. (2005) The Universal Protein Resource (UniProt), Nucleic Acids Res, 33, D154-9. Abstract [ ]
Benjamini, Y. and Hochberg, Y. (1995) Controlling the False Discovery Rate - a Practical and Powerful Approach to Multiple Testing, Journal of the Royal Statistical Society Series B-Methodological, 57, 289-300. Abstract [ ]
Bowes, J. B., Snyder, K. A., Segerdell, E., Jarabek, C. J., Azam, K., Zorn, A. M., and Vize, P. D. (2009) Xenbase: gene expression and improved integration, Nucleic Acids Res, 38, D607-12. Abstract [ ]
Bradford, Y., Conlin, T., Dunn, N., et al. (2011) ZFIN: enhancements and updates to the Zebrafish Model Organism Database, Nucleic Acids Res, 39, D822-9. Abstract [ ]
Engel, S. R., Balakrishnan, R., Binkley, G., et al. (2010) Saccharomyces Genome Database provides mutant phenotype data, Nucleic Acids Res, 38, D433-6. Abstract [ ]
Fleischmann, A., Darsow, M., Degtyarenko, K., Fleischmann, W., Boyce, S., Axelsen, K.B., Bairoch, A., Schomburg, D., Tipton, K.F. and Apweiler, R. (2004) IntEnz, the integrated relational enzyme database, Nucleic Acids Res, 32, D434-7. Abstract [ ]
Gough, J. (2006) Genomic scale sub-family assignment of protein domains, Nucleic Acids Res, 34, 3625-3633. Abstract [ ]
Grumbling, G. and Strelets, V. (2006) FlyBase: anatomical data, images and queries, Nucleic Acids Res, 34, D484-8. Abstract [ ]
Ilic, K., Kellogg, E. A., Jaiswal, P., et al. (2006) The plant structure ontology, a unified vocabulary of anatomy and morphology of a flowering plant, Plant Physiol, 143, 587-99. Abstract [ ]
Morgat, A., Coissac, E., et al. (2006) UniPathway: a resource for the exploration and annotation of metabolic pathways, Nucleic Acids Res, 40, D761-9. Abstract [ ]
Osborne,J.D., Flatow,J., Holko,M., Lin,S.M., Kibbe,W.A., Zhu,L.J., Danila,M.I., Feng,G. and Chisholm,R.L. (2009) Annotating the human genome with Disease Ontology. BMC Genomics, 10, S1–S6. Abstract [ ]
Pujar, A., Jaiswal, P., Kellogg, E. A., et al. (2006) Whole-plant growth stage ontology for angiosperms and its application in plant biology, Plant Physiol, 142, 414-28. Abstract [ ]
Robinson, P.N., Kohler, S., Bauer, S., Seelow, D., Horn, D. and Mundlos, S. (2008) The Human Phenotype Ontology: a tool for annotating and analyzing human hereditary disease. Am J Hum Genet, 83, 610-615. Abstract [ ]
Knox, C., Law, V., et al. (2011) DrugBank 3.0: a comprehensive resource for 'omics' research on drugs. Nucleic Acids Res, 39, D1035-41. Abstract [ ]
Schindelman, G., Fernandes, J. S., Bastiani, C. A., Yook, K. and Sternberg, P. W. (2011) Worm Phenotype Ontology: integrating phenotype data within and beyond the C. elegans community, BMC Bioinformatics, 12:32. Abstract [ ]
Schriml LM, Arze C, Nadendla S, et al. (2012) Disease Ontology: A backbone for disease semantic integration. Nucleic Acids Res, 40, D940-D946. Abstract [ ]
Smith, C.L. and Eppig, J.T. (2009) The Mammalian Phenotype Ontology: enabling robust annotation and comparative analysis, Wiley Interdiscip Rev Syst Biol Med, 1, 390-399. Abstract [ ]

SEARCH

BROWSE

TOOLS

ABOUT

HELP

Supra-domain2GO Inference and Supra-domain Phenotype Ontology (SPPO)

The pipeline of building supra-domain PO annotations

Obtaining supra-domain mammalian phenotype ontology

Data Availability

References