Metagenomic Classification

The CosmosID-HUB enables fast and accurate metagenomic analysis for microbiome data, including advanced strain-level detection across multiple kingdoms + AMR/VF and Functional analysis in a single processing pipeline. To do so, the HUB is powered by a curated Multi-Kingdom Reference Database (GenBook), a patented k-mer based algorithm (Kepler), and advanced machine learning filters.

  1. GenBook is the most comprehensive and curated metagenomic database of over 180,000 genomes and gene sequences across bacteria, fungi, viruses, phages, protists. The curation process ensures maximum sensitivity through limiting redundancy and homogeneity within overly populated clades (i.e., Staphylococcus aureus). It is also curated agnostically of sample type, enabling standardization of analysis with the same pipeline across all samples within your project. This, along with genome QC, ensures only high-quality database entries, reducing false positive calls.
  2. The HUB's algorithm is a patented and benchmarked kmer-based algorithm for efficient and highly accurate profiling. Kmers are categorized as either unique or shared at every level of the phylogenetic tree, enabling near-neighbor phylogenetic placement. Kmers are then validated to be phylogenetically stable and not to hit mobile elements or the human genome. The phylogenetic ontology of GenBook™ enables accurate differentiation down to strain-level.
  3. Machine-Learning of 10,000+ samples enables the pipeline to differentiate between signal vs. noise and maintain a high level of sensitivity without sacrificing precision. This results in a higher F1 score as demonstrated in benchmarks & community challenges.

Samples for metagenomic analysis are whole genome shotgun sequencing files - usually fastq or fasta files. Paired end files can be combined into one sample upon upload as long as they are uploaded together. See Upload Samples & Analyze Samples for more details on how to run your samples.

After upload, CosmosID automatically analyzes your samples. The results include tables and visualizations for our genome databases: bacteria, fungi, protista, viruses, and respiratory viruses, and for our gene databases: antimicrobial resistance and virulence factors.


An extensive academic study shows CosmosID's “Best in Class Accuracy” and “Unrivaled Detection Resolution”. As shown in the figures below, CosmosID clearly offers the best identification accuracy for the entire benchmarking dataset based on F1, precision, recall, and AUPR. Most importantly, unlike other tools, identification accuracy is maintained at all taxonomic levels. Strikingly, most of the tools grossly fall short in classifying organisms at sub-species (and strain) level resolution, but CosmosID provided unrivaled accuracy at sub-species level.

What’s Next