Species Tree Of Life (sTOL)
This tree is fully resolved (bifurcating + estimated branch length) and takes advantage of both the domain content of genomes and the existing taxonomic status quo. It is reconstructed using RAxML under the constraint of the NCBI taxonomy based on presence/absence of molecular characters including: SCOP domains (both at the superfamily and family levels) and supra-domains. After tree reconstruction, each internal node is either mapped onto a unique NCBI taxon_id or is left empty (as a hypothetical unknown ancestor).
The NCBI taxonomy
The NCBI taxonomy incorporates taxonomic knowledge from a variety of sources into a very partially resolved species tree representing the status quo of mostly basic taxanomic groupings. Topologically, it is multifurcating, with most nodes having very many descendants. There is no measure of evolutionary distance (branch lengths) in the NCBI taxonomy. It does however include some common taxonomic ranks (from high to low) including Superkingdom, Kingdom, Phylum, Class, Order, Family, Genus, Species.
Gateway to tree browsing
To navigate the sTOL, we display a path from a given node upward, leading as far as teh ancestral superkingdom (i.e., all ancestral nodes to the current node of interest in a sequential order. Direct children are also listed. For each specific clade along the path, we use TreeVector for visualization, and provide Newick tree format for downloads, with node objects: either Codes (i.e., the 2-letter genome identifiers used by the SUPERFAMILY database), or TaxIDs (i.e., NCBI taxonomy IDs), or Names (Full names).
Applications in small-scale studies
The tree or its derived subtrees can be used to display the distribution of: 1) a specific domain, such as Nuclear receptor ligand-binding domain (sunid=48508) distributed over the path from human leading upwards Eukaryota, or 2) as a whole sets of domains annotated to a specific term, such as those domains annotated to stem cell maintenance (GO:0019827 from Gene Ontology) or immune system cancer (DOID:0060083 from Disease Ontology) distributed over the path from human leading upwards Eukaryota.
Application to large-scale studies
It is possible to use the sTOL to annotate whole extant/ancestral domain repertoires. We reconstruct ancestral genome content by applying Dollo parsimony over eukaryotic evolution, also calculating the gains and losses of molecular characters compared to their direct parents/children. Then, we use domain-centric annotations to perform enrichment analysis of present/gained/lost ancestral domain repertoires. Inferred ancestral terms (like those from GO) in Eukaryotes by enrichment analysis tell us of the functional implications of changes to the protein repertoire during eukaryotic evolution.