Functional Classification

NOTE: If your sample has a high host background, please contact [email protected] before uploading your samples to CosmosID-HUB Microbiome. Our support team will assist you in de-hosting the raw sequencing reads and then uploading the data to the platform. High host content specimen includes skin, oral swabs etcetera.

Functional profiling from whole genome shotgun microbiome or metatranscriptomic short reads sequencing data provides crucial insights into the genomic potential of underlying molecular, biochemical and metabolic activities of microbial communities. Understanding the functional potential of a microbial community also allows testing of hypotheses to link or associate specific molecular or biochemical activities to environmental and health associated phenotypes. In order to aid scientists explore and investigate these hypotheses, we are pleased to introduce the functional workflow in CosmosID-HUB Microbiome that leverages MetaCyc Pathways database and GO Terms database to characterize the functional potential of the microbiome community.

The single sample view of functional workflow entails the tabular view of both MetaCyc Pathways along with stacked bar chart and donut chart to aid in visual inspection of functional capabilities of the microbiome population.

Clicking on Pathway ID and GO Terms ID will take you to that specific feature's description in MetaCyc and GO Terms database.

Technical Appendix

Initial QC, adapter trimming and preprocessing of metagenomic sequencing reads are done using BBduk (1). The quality controlled reads are then subjected to a translated search against a comprehensive and non-redundant protein sequence database, UniRef 90. The UniRef90 database, provided by UniProt (2), represents a clustering of all non-redundant protein sequences in UniProt, such that each sequence in a cluster aligns with 90% identity and 80% coverage of the longest sequence in the cluster. The mapping of metagenomic reads to gene sequences are weighted by mapping quality, coverage and gene sequence length to estimate community wide weighted gene family abundances as described by Franzosa et al (3). Gene families are then annotated to MetaCyc (4) reactions (Metabolic Enzymes) to reconstruct and quantify MetaCyc (4) metabolic pathways in the community as described by Franzosa et al (3). Furthermore, the UniRef_90 gene families are also regrouped to GO terms (5) in order to get an overview of GO functions in the community. Lastly, to facilitate comparisons across multiple samples with different sequencing depths, the abundance values are normalized using Total-sum scaling (TSS) normalization to produce "Copies per million" (analogous to TPMs in RNA-Seq) units.

Bushnell, B. (2021). BBDuk Guide - DOE Joint Genome Institute. Retrieved 1 August 2021, from

UniProt: the universal protein knowledgebase. (2016). Nucleic Acids Research, 45(D1), D158-D169. doi: 10.1093/nar/gkw1099

Franzosa, E., McIver, L., Rahnavard, G., Thompson, L., Schirmer, M., & Weingart, G. et al. (2018). Species-level functional profiling of metagenomes and metatranscriptomes. Nature Methods, 15(11), 962-968. doi: 10.1038/s41592-018-0176-y

Caspi, R., Foerster, H., Fulcher, C., Kaipa, P., Krummenacker, M., & Latendresse, M. et al. (2007). The MetaCyc Database of metabolic pathways and enzymes and the BioCyc collection of Pathway/Genome Databases. Nucleic Acids Research, 36(Database), D623-D631. doi: 10.1093/nar/gkm900

Carbon, S., Ireland, A., Mungall, C., Shu, S., Marshall, B., & Lewis, S. (2008). AmiGO: online access to ontology and annotation data. Bioinformatics, 25(2), 288-289. doi: 10.1093/bioinformatics/btn615