Documentation Index
Fetch the complete documentation index at: https://docs.cosmosid.com/llms.txt
Use this file to discover all available pages before exploring further.
Building Custom Prevalence Models
MaAsLin3 in Cosmos‑Hub simultaneously models both abundance and prevalence associations from the same microbiome feature table, making it one of the most comprehensive tools available for differential microbiome analysis. This guide explains how MaAsLin3 defines prevalence internally, how to export the correct data from Cosmos‑Hub, and how to use that data to build custom logistic regression models for secondary prevalence analyses.How MaAsLin3 Models Prevalence
MaAsLin3 is a generalized multivariable modeling framework designed to identify microbial associations in complex, high-dimensional datasets. Unlike approaches that model only abundance, MaAsLin3 captures two complementary biological signals from the same input feature table:- Prevalence – how often a feature is detected across samples, modeled via logistic regression on a binary presence/absence profile.
- Abundance – how much of a feature is present when detected, modeled via log‑linear regression on non‑zero abundances.
“MaAsLin 3 takes as input a table of microbial community feature abundances and metadata. These feature data are normalized, filtered, split into prevalence (present versus absent) and log‑transformed nonzero abundances, and fit with a modified logistic model and a linear model, respectively.”When MaAsLin3 runs in Cosmos‑Hub, the workflow proceeds as follows:
- The input feature abundance table (taxa, pathways, or functional features) is normalized — by default using total‑sum scaling to relative abundances.
- Optional filtering is applied to remove extremely rare or low-variance features based on minimum prevalence, minimum abundance, and minimum variance thresholds.
- The filtered table is split into:
- A binary prevalence profile (present = 1, absent = 0) for logistic regression.
- A non‑zero abundance subset (log‑transformed) for linear regression.
Input Data export are the source of both the prevalence and abundance models.
Defining Presence for Custom Logistic Regression
A common question when extending MaAsLin3 results is: what threshold should I use to define “presence”? The answer is straightforward. MaAsLin3 defines presence as any non‑zero value in the normalized feature table. There is no alternative recommended cutoff (such as >1e‑4 or >1% relative abundance) in the MaAsLin3 documentation or paper. Sparsity is managed through feature filtering parameters applied before the model, not by changing the definition of presence itself.Recommended binary coding
| Value in Input Data | Prevalence Code | Interpretation |
|---|---|---|
| > 0 | 1 | Present |
| = 0 | 0 | Absent |
Exporting MaAsLin3 Input Data from Cosmos‑Hub
To build a presence/absence matrix that mirrors MaAsLin3, start from the exact feature table used in the Hub run.Open your MaAsLin3 comparative analysis
Click Export
Locate the three key files
Input Data— A.tsvabundance matrix used as MaAsLin3’s feature input (samples as rows, features as columns). This is the primary file for building presence/absence variables.Input Metadata— A.tsvfile with all metadata variables used as covariates and outcomes in the model.Association Results— The full MaAsLin3 output table with model type (abundance vs prevalence), beta coefficients, p‑values, and q‑values (FDR).
https://docs.cosmosid.com/docs/maaslin3-view-results
Building a Presence/Absence Matrix
Once you have exported theInput Data file, follow these steps to create a binary presence/absence matrix for logistic regression.
Load the Input Data table
Input Data .tsv file into R, Python, or your preferred statistical environment. The table has samples as rows and microbial features (taxa, pathways, etc.) as columns.Recode each feature column to binary
- Assign 1 if the value is > 0 (present).
- Assign 0 if the value is = 0 (absent).
Apply a minimum prevalence filter
Choosing a Minimum Prevalence Threshold
Running logistic regression on extremely rare features — those present in only a few samples — leads to unstable model estimates, separation issues, and uninterpretable results. It is therefore recommended to apply a minimum prevalence filter to your feature set before modeling.How to calculate prevalence
For each feature, compute its prevalence as: Prevalence (%) = [Number of samples where feature is present (> 0)]/[Total number of samples]]Recommended thresholds
| Study size | Suggested minimum prevalence |
|---|---|
| Large cohort (n > 100) | 5% of samples |
| Medium cohort (n = 50–100) | 10% of samples |
| Small cohort (n < 50) | 10–20% of samples |
Example Methods Text
Use the following template for your manuscript or internal SOP methods section, adapted to your study:“For secondary prevalence analyses, the MaAsLin3 input abundance matrix was exported from Cosmos‑Hub (www.cosmos-hub.com, Cmbio, Germantown, MD). Each microbial feature was coded as present (1) when its input abundance was non‑zero and absent (0) otherwise, consistent with the binary presence/absence framework used by MaAsLin3 for logistic prevalence modeling [cite MaAsLin3 paper]. Features with prevalence below [X]% of samples were excluded prior to logistic regression to reduce model instability driven by extremely rare taxa.”Replace [X]% with your chosen minimum prevalence threshold (e.g., 5%, 10%).
Frequently Asked Questions
Should I use the Input Data or the Association Results file to build my prevalence matrix?
Should I use the Input Data or the Association Results file to build my prevalence matrix?
Does MaAsLin3 recommend a specific presence threshold higher than >0?
Does MaAsLin3 recommend a specific presence threshold higher than >0?
Can I use a relative abundance cutoff (e.g., >0.01%) to define presence?
Can I use a relative abundance cutoff (e.g., >0.01%) to define presence?
What filtering does MaAsLin3 apply before splitting the data into prevalence and abundance?
What filtering does MaAsLin3 apply before splitting the data into prevalence and abundance?
Input Data export, so any features that were excluded by MaAsLin3 prior to modeling will not appear in the export.