CHAMP is published in Frontiers in Microbiology! Read about pipeline and benchmarking stats in our recent paper, “CHAMP delivers accurate taxonomic profiles of the prokaryotes, eukaryotes, and bacteriophages in the human microbiome”

Overview

CHAMP is a next-generation human microbiome profiling pipeline developed to deliver high-resolution taxonomic and functional insights from shotgun metagenomic sequencing data. By leveraging an expansive and highly curated reference database, CHAMP offers precise species-level identification, robust functional potential profiling, and superior performance compared to other commonly used tools in the field. Some highlights of the pipeline include:
  • Species-level resolution across prokaryotes, eukaryotes, and viruses.
  • Built on 400,000+ metagenome-assembled genomes (MAGs) from >30,000 human microbiome samples
  • Benchmark-leading accuracy compared to MetaPhlAn4, Kraken2, Bracken, and Centrifuge
  • Functional annotations from Gut-Brain Module (GBM), Gut-Metabolic Module (GMM), and KEGG
  • Supports phage/virome detection, strain-level profiling, and clonal resolution (custom analysis; contact info@cmbio.io)
CHAM Pinfographic(minusviralgenomes) Pn

Use Cases:

  • Human health studies (e.g., IBD, metabolic disorders)
  • Probiotic development (strain-specific tracking, clonal resolution)
  • Population-scale microbiome research
  • Virome and bacteriophage profiling

Reference Database Construction

  1. Reference catalog is derived from:
    • Publicly available MAGs (UHGG, ELGG)
    • Clinical Microbiomics/Cmbio in-house MAGs
    • NCBI/PATRIC genomes (to capture pathogens, probiotics, food-related species)
  2. MAG Processing
    • Assembly tools: Megahit, metaSPAdes
    • Binning: VAMB
    • Quality control: CheckM2 (>90% completeness, <5% contamination), GUNC
    • Annotation: GTDB-Tk with GTDB r214
  3. Species Clustering
    • MAGs clustered at 95% ANI using dRep/FastANI
    • Pan-genomes constructed in 3 stages using MMseqs2 and CD-HIT
    • Final database includes ~6,567 prokaryotic species clusters and 244 eukaryotic species
    • Final non-redundant gene catalog: >25 million genes

Species Quantification

Signature Gene Selection Each species is represented by up to 250 signature genes, selected for:
  • Uniqueness (no cross-species matches at >97% identity for >=100 bp)
  • Core gene status (present in >=60% of MAGs)
  • Length (>=200 bp and <=20 kbp)
If fewer than 20 unique genes are available, segments without homologs are selected and masked accordingly.

Read Mapping & Quantification

  1. Preprocessing:
    • Human host-filtering (GRCh38 via Bowtie2)
    • Adapter and quality trimming (AdapterRemoval)
    • Paired-read retention (both reads >=100 bp)
  2. Read Mapping:
    • BWA-MEM with >=95% identity over >=100 bp and MAPQ >=20
    • Reads categorized as uniquely mapped, multi-mapped, or unmapped
  3. Abundance Estimation:
    • Based on a negative binomial model using signature gene counts
    • Adjusted for gene length and mapping confidence
    • Outlier filtering via quantile boundaries
    • Final species abundances normalized to sum to 100% per sample

Functional Profiling

Functional Annotation
  • Prokaryotic genes: Annotated using EggNOG-mapper (v5.0) for orthologous groups and KEGG Orthology (KO)
  • Eukaryotic genes: Annotated with KofamScan using KO assignments
Functional Modules CHAMP links functional potential to taxonomic identity through defined modules:
  1. KEGG Modules
    • Defined as sets of KO identifiers encoding a biological function or pathway
    • A species is considered to possess a KEGG module if:
      1. It contains genes for >= 2/3 of the required steps
      2. For modules with alternative paths, only one must meet the 2/3 threshold
      3. Modules with <=3 steps require all steps present
    • Module Abundance (Cellular Abundance): Computed as the sum of relative abundances of species possessing the module
  2. Gut-Brain Modules (GBMs)
    • 56 microbial pathways involved in the synthesis/degradation of neuroactive compounds (e.g., GABA, dopamine)
    • Pathways defined using KO, TIGRFAM, and EggNOG orthologs
    • Same 2/3-rule applies for species inclusion
  3. Gut Metabolic Modules (GMMs)
    • 103 conserved gut microbiome pathways (e.g., SCFA production, bile acid metabolism)
    • Module completion and quantification rules follow KEGG module logic
Cellular Abundance vs. Relative Abundance
MetricCellular Abundance (CHAMP)Relative Abundance (e.g., Kepler)
BasisSpecies-level gene presence & pathway completenessRead alignment to functional gene signatures
WeightingBased on species abundanceBased on normalized read counts (e.g., CPM)
Output InterpretationFraction of species capable of functionProportional activity/presence of gene products
Data FlowSpecies -> FunctionReads -> Function
Screenshot2025 07 21at3 26 06PM Pn

CHAMP Benchmarking

CHAMP demonstrates best-in-class performance using both CAMI and NIBSC benchmarks against the best and most widely used profiling pipelines in the field.
  • Compared to MetaPhlAn4, CHAMP showed 16% greater sensitivity (recall) across different human body sites and showed an astounding 400 times lower false signal in the NIBSC mock community benchmark compared to state-of-the-art profilers (MetaPhlAn4, Centrifuge, Kraken, and Bracken). This means that when CHAMP detects something, you can trust it’s there.
  • CHAMP uses the latest GTDB annotation which includes more rare species and, compared to NCBI, often classifies sub-species as species in their own right.
  • Read the benchmark whitepaper here: CHAMP Benchmark Whitepaper
Screenshot2025 07 21at3 25 56PM Pn

Detailed Methods Summary