Guides
Guides

2/15/24 » Database Release: Bacterial/Fungal

Good news! We have updated our Bacteria and Fungi databases, now leveraging GTDB-based nomenclature.

We are excited to release updated bacterial and fungal databases with > 15,000 new curated genomes, bringing our total to more than 170,000 microbial genomes and gene sequences representing bacteria, viruses, protists, and fungi, as well as antibiotic resistant and virulence associated genes. With thousands of new strains and species, we are now able to characterize complex microbiome communities with increased sensitivity and specificity.

The CosmosID databases are organized phylogenetically and contain hundreds of millions of marker gene sequences. The markers represent both coding and non-coding sequences uniquely identified by taxon and/or distinct nodes of phylogenetic trees. This means that the tree structure was created based on genomic relatedness of organisms rather than predetermined taxonomy based on phenotype. This allows CosmosID to have a high degree of accuracy in identifying microorganisms based on their DNA in metagenomic samples. It also helps identify the closest match to genomes that do not have strain level references in the database (if, for example, they have never been sequenced before).


We have now switched from NCBI- to GTDB-based nomenclature.

Classifying bacteria is crucial for understanding their ecology, evolution, and potential roles in health and disease. Two prominent systems for bacterial taxonomy are the National Center for Biotechnology Information (NCBI) taxonomy and the Genome Taxonomy Database (GTDB). While both aim to categorize bacteria, they differ in their underlying philosophies and methodologies, leading to discrepancies in classification.

The NCBI taxonomy, a long-standing system, curates classifications based on a combination of phenotypic and genotypic data [1]. However, in recent years, researchers highlighted inconsistencies arising from historical practices and the subjective nature of phenotypic traits [2]. Additionally, the NCBI taxonomy is not strictly rank-normalized, meaning equivalent evolutionary distances can be assigned different taxonomic ranks [1].

GTDB, a newer system, focuses on a phylogenetically consistent and rank-normalized classification based on whole-genome sequences. It utilizes Average Nucleotide Identity (ANI) to delineate species boundaries and Relative Evolutionary Divergence (RED) to define higher taxonomic ranks [2]. This approach aims to provide a more objective and evolutionarily informative classification scheme leading to GTDB often recognizing finer taxonomic levels and proposing novel lineages not present in the NCBI taxonomy .
Here's a breakdown of the key differences between GTDB and NCBI taxonomies:

  • Philosophy: NCBI - Curator-driven, phenotypic and genotypic data; GTDB - Genome-based, phylogenetically consistent.
  • Data source: NCBI Refseq and Genbank; GTDB - NCBI Refseq and Genbank
  • Species definition: NCBI - Less stringent, often based on 16S rRNA gene sequence similarity [1]; GTDB - Stricter, based on ANI values (>95% for bacteria) [2].
  • Higher-rank classification: NCBI - Traditional Linnaean ranks (phylum, class, order, etc.); GTDB - Ranks are normalized based on evolutionary divergence (phylum level for Bacillota can be Bacillota, Bacillota_A, Bacillota_B; ) [2].

Few notable reclassifications compared to NCBI taxonomy are mentioned below with relevant links justifying the reclassification.
Shigella -> Escherichia genus https://www.biorxiv.org/content/10.1101/2021.09.22.461432v1
Gardenerella genus -> Bifidobacterium genus ​​https://www.atcc.org/resources/posters/2019-posters/reclassification-of-the-bifidobacterium-and-gardnerella-genera

In conclusion, GTDB taxonomy offers a more objective and phylogenetically consistent framework which matches with Kepler’s philosophy of phylogenetic profiling. Hence, GTDB was chosen to be the taxonomic engine behind Kepler. To learn more about GTDB, please visit https://gtdb.ecogenomic.org/.

Thank you for being a valued member of the CosmosID-HUB community. Your research journey is important to us, and we’re committed to ensuring it’s as impactful as possible.

Here’s to exploring new frontiers together!

References:

  1. Schoch, C.L., Ciufo, S., Domrachev, M., Hotton, C.L., Kannan, S., Khovanskaya, R., Leipe, D., McVeigh, R., O’Neill, K., Robbertse, B., Sharma, S., Soussov, V., Sullivan, J.P., Sun, L., Turner, S. and Karsch-Mizrachi, I., 2020. NCBI Taxonomy: a comprehensive update on curation, resources and tools. Database, 2020, baaa062.
  2. Parks, D.H., Chuvochina, M., Waite, D.W., Rinke, C., Skarshewski, A., Chaumeil, P.-A., and Hugenholtz, P., 2018. A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nature Biotechnology, 36(10), pp.996-1004