Skip to main content

Functional Profiling in Cosmos-Hub

Cosmos-Hub delivers an end-to-end, host-agnostic workflow for characterizing the functional potential of microbiomes across research, clinical, and environmental projects. The platform integrates state-of-the-art bioinformatics for read QC, translated gene family assignment, annotation, normalization, and interactive visualization.

Functional Databases Available in the Cosmos-Hub

DatabaseRole/What It CapturesApplication ExampleFunctional Insight
ECEnzymatic reactionsEnzyme-level annotationMechanistic metabolic view
MetaCycPathway networksPathway predictionSystem-level metabolism
PfamProtein domains/motifsDomain predictionStructural adaptation
CAZyCarbohydrate enzymesFiber/glycan profilingFiber vs mucin processing
GO TermsBroad gene functionActivity trendsBroad functional overview

Pipeline Overview

  1. QC & Processing: reads are cleaned and trimmed using BBDuk.
  2. Gene Family Mapping: Reads are aligned via translated search to UniRef90, a non-redundant protein cluster database, enabling robust assignment to community-wide gene families.
  3. Functional Annotation: Gene families are mapped to five cornerstone databases:
    • EC (Enzyme Commission): Classifies sequences by biochemical reactions.
    • MetaCyc Pathways: Links genes to known metabolic networks, supporting pathway-level inference.
    • Pfam: Identifies protein domains and conserved motifs.
    • CAZy: Focuses on carbohydrate-active enzymes, resolving fiber, mucin, glycan processing.
    • GeneOntology (GO) Terms: Hierarchical gene ontology terms for molecular functions, biological processes, and cellular components.
  4. Normalization: Abundances normalized as “copies per million,” facilitating cross-sample and cross-study comparisons.
  5. Visualization & Analysis: Results rendered as heatmaps and barcharts; enables differential testing (LEfSe, MaAsLin) on functional categories, supporting biomarker discovery, R&D, and regulatory submissions.

Viewing the Functional Profiling Results

Each sample’s functional analysis can be navigated using the “Results” dropdown in the Sample Menu, which includes:
  • A tabular summary of database-level annotations
  • Visual charts for intuitive inspection of pathway and enzymatic profiles
  • Links for feature-level exploration on source database websites
The single sample view of functional workflow entails the tabular view of all databases along with stacked bar chart and donut chart to aid in visual inspection of functional capabilities of the microbiome population.
Functional Workflow
Clicking on the first column for each respective functional databases will take you to that specific feature’s description on that respective database’s website. Screen Recording2025 10 15at3 25 54PM Ezgif Com Video To Gif Converter Gi

Technical Notes

  • Reads mapped with weighting for mapping quality, coverage, and gene sequence length.
  • Annotations use curated crosswalks from UniRef90 to EC, MetaCyc, Pfam, CAZy, and GO Terms.
  • Abundance values normalized for sequencing depth using Total-sum scaling (TSS) (copies per million).
  • Reference guides: Bushnell 2021, UniProt 2016, Franzosa et al. 2018, Caspi et al. 2007, Carbon et al. 2008. 

Why perform functional analysis on your microbiome data?

By integrating all these annotation types, Cosmos-Hub (and similar functional profiling frameworks) can:
  • Map the “wiring” of metabolism: EC → MetaCyc lets you see how individual enzymatic steps may assemble into pathways.
  • Go beyond metabolism: GO and Pfam capture regulatory, structural, transport, signaling, and other auxiliary functions.
  • Provide specialization where it’s needed: CAZy focuses on carbohydrate metabolism, which is often key in microbiome studies (e.g. gut fiber degradation).
  • Differential functional analysis: you can compare which functional categories are enriched or depleted across sample groups (e.g. disease vs healthy) in an interpretable manner (e.g. “glycolysis up,” “cell wall polysaccharide degradation down”).
In other words, rather than only asking “which taxa are present?”, you can ask “what can they do or are likely capable of doing?” in biochemical, ecological, or physiological terms.

Example Use Cases of Functional Profiling for Gut Microbiome

Healthy vs. Disease (e.g., Inflammatory Bowel Disease)

Cosmos-Hub identifies specific enzyme families (via EC numbers) whose abundance changes between groups.
  • Healthy gut: Higher abundance of enzymes such as butyrate kinase (EC 2.7.2.7) and acetate-CoA transferase (EC 2.8.3.8), reflecting active short-chain fatty acid (SCFA) production that supports gut barrier integrity.
  • IBD gut: Increased nitrate reductase (EC 1.7.99.4) and formate dehydrogenase (EC 1.17.1.9), suggesting a shift toward oxidative and nitrate respiration, consistent with inflammatory and dysbiotic states.
EC-level insights give you mechanistic clues about which chemical reactions dominate in each condition.
Using EC annotations, Cosmos-Hub infers pathway presence and abundance through MetaCyc.
  • Healthy gut: Enrichment in pathways like butyrate biosynthesis I (PWY-5676), methanogenesis from H₂/CO₂ (METHANOGENESIS-PWY), and vitamin K₂ (menaquinone) biosynthesis (PWY-5838) — all key for maintaining energy balance and mucosal health.
  • IBD gut: Enrichment in lipopolysaccharide (LPS) biosynthesis and nitrate reduction pathways, consistent with a pro-inflammatory metabolic signature.
MetaCyc translates raw gene potential into interpretable, system-level metabolism.
Pfam domain profiling highlights structural motifs that increase in disease.
  • Healthy gut: More ABC transporter permease domains (PF02653) supporting nutrient import/export.
  • IBD gut: Elevated TonB-dependent receptor domains (PF00593) and heat shock protein domains (PF00012), reflecting stress adaptation and host-derived substrate utilization.
Pfam captures subtle domain-level adaptations beyond classical enzyme functions.
Carbohydrate-active enzymes show clear ecological reprogramming:
  • Healthy gut: Enrichment of GH43 and GH3 families involved in dietary fiber degradation (xylan, arabinoxylan, cellulose).
  • IBD gut: Drop in fiber-degrading CAZymes and rise in GH98 and GH92 families targeting host mucins, suggesting a shift toward mucosal glycan foraging.
CAZy data reveals how the microbiome’s “diet” shifts from plant polysaccharides to host glycans.
Considerations & Limitations
  • DNA/transcript-based annotation infers functional potential, not direct activity.
  • Some taxa and functions may be underrepresented in current databases.
  • Pathway completeness is predicted; presence does not guarantee metabolic flux.
For custom protocols, sample-type recommendations, or application-specific database advice, contact help@cosmos-hub.com.

Technical Appendix

FUNCTIONAL Workflow:

Initial QC, adapter trimming and preprocessing of metagenomic sequencing reads are done using BBduk (1). The quality controlled reads are then subjected to a translated search against a comprehensive and non-redundant protein sequence database, UniRef 90. The UniRef90 database, provided by UniProt (2), represents a clustering of all non-redundant protein sequences in UniProt, such that each sequence in a cluster aligns with 90% identity and 80% coverage of the longest sequence in the cluster. The mapping of metagenomic reads to gene sequences are weighted by mapping quality, coverage and gene sequence length to estimate community wide weighted gene family abundances as described by Franzosa et al (3). Gene families are then annotated to MetaCyc (4) reactions (Metabolic Enzymes) to reconstruct and quantify MetaCyc (4) metabolic pathways in the community as described by Franzosa et al (3). Furthermore, the UniRef_90 gene families are also regrouped to Enzyme Commission Enzymes, Pfam protein domains, CAZy enzymes and GO Terms in order to get an exhaustive overview of gene functions in the community. Lastly, to facilitate comparisons across multiple samples with different sequencing depths, the abundance values are normalized using Total-sum scaling (TSS) normalization to produce “Copies per million” (analogous to TPMs in RNA-Seq) units. References:
  1. Bushnell, B. (2021). BBDuk Guide - DOE Joint Genome Institute. Retrieved 1 August 2021, from https://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/bbduk-guide/
  2. UniProt: the universal protein knowledgebase. (2016). Nucleic Acids Research, 45(D1), D158-D169. doi: 10.1093/nar/gkw1099
  3. Franzosa, E., McIver, L., Rahnavard, G., Thompson, L., Schirmer, M., & Weingart, G. et al. (2018). Species-level functional profiling of metagenomes and metatranscriptomes. Nature Methods, 15(11), 962-968. doi: 10.1038/s41592-018-0176-y
  4. Caspi, R., Foerster, H., Fulcher, C., Kaipa, P., Krummenacker, M., & Latendresse, M. et al. (2007). The MetaCyc Database of metabolic pathways and enzymes and the BioCyc collection of Pathway/Genome Databases. Nucleic Acids Research, 36(Database), D623-D631. doi: 10.1093/nar/gkm900
  5. Carbon, S., Ireland, A., Mungall, C., Shu, S., Marshall, B., & Lewis, S. (2008). AmiGO: online access to ontology and annotation data. Bioinformatics, 25(2), 288-289. doi: 10.1093/bioinformatics/btn615