Building Custom Prevalence Models
MaAsLin3 in Cosmos‑Hub simultaneously models both abundance and prevalence associations from the same microbiome feature table, making it one of the most comprehensive tools available for differential microbiome analysis. This guide explains how MaAsLin3 defines prevalence internally, how to export the correct data from Cosmos‑Hub, and how to use that data to build custom logistic regression models for secondary prevalence analyses.This guide is intended for researchers who have already run MaAsLin3 in Cosmos‑Hub and want to extend their prevalence analysis using external tools such as R or Python. If you have not yet run MaAsLin3, start with the MaAsLin3 overview.
How MaAsLin3 Models Prevalence
MaAsLin3 is a generalized multivariable modeling framework designed to identify microbial associations in complex, high-dimensional datasets. Unlike approaches that model only abundance, MaAsLin3 captures two complementary biological signals from the same input feature table:- Prevalence – how often a feature is detected across samples, modeled via logistic regression on a binary presence/absence profile.
- Abundance – how much of a feature is present when detected, modeled via log‑linear regression on non‑zero abundances.
“MaAsLin 3 takes as input a table of microbial community feature abundances and metadata. These feature data are normalized, filtered, split into prevalence (present versus absent) and log‑transformed nonzero abundances, and fit with a modified logistic model and a linear model, respectively.”When MaAsLin3 runs in Cosmos‑Hub, the workflow proceeds as follows:
- The input feature abundance table (taxa, pathways, or functional features) is normalized — by default using total‑sum scaling to relative abundances.
- Optional filtering is applied to remove extremely rare or low-variance features based on minimum prevalence, minimum abundance, and minimum variance thresholds.
- The filtered table is split into:
- A binary prevalence profile (present = 1, absent = 0) for logistic regression.
- A non‑zero abundance subset (log‑transformed) for linear regression.
Input Data export are the source of both the prevalence and abundance models.
Defining Presence for Custom Logistic Regression
A common question when extending MaAsLin3 results is: what threshold should I use to define “presence”? The answer is straightforward. MaAsLin3 defines presence as any non‑zero value in the normalized feature table. There is no alternative recommended cutoff (such as >1e‑4 or >1% relative abundance) in the MaAsLin3 documentation or paper. Sparsity is managed through feature filtering parameters applied before the model, not by changing the definition of presence itself.Recommended binary coding
| Value in Input Data | Prevalence Code | Interpretation |
|---|---|---|
| > 0 | 1 | Present |
| = 0 | 0 | Absent |
Exporting MaAsLin3 Input Data from Cosmos‑Hub
To build a presence/absence matrix that mirrors MaAsLin3, start from the exact feature table used in the Hub run.Open your MaAsLin3 comparative analysis
Navigate to your MaAsLin3 run from the Comparative Analysis dashboard in Cosmos‑Hub.
Click Export
Click the Export button in the top‑right of the analysis view. Cosmos‑Hub will generate and download a ZIP archive containing all output files for your MaAsLin3 run.
Locate the three key files
Inside the ZIP, you will find:
Input Data— A.tsvabundance matrix used as MaAsLin3’s feature input (samples as rows, features as columns). This is the primary file for building presence/absence variables.Input Metadata— A.tsvfile with all metadata variables used as covariates and outcomes in the model.Association Results— The full MaAsLin3 output table with model type (abundance vs prevalence), beta coefficients, p‑values, and q‑values (FDR).
https://docs.cosmosid.com/docs/maaslin3-view-results
Building a Presence/Absence Matrix
Once you have exported theInput Data file, follow these steps to create a binary presence/absence matrix for logistic regression.
Load the Input Data table
Import the
Input Data .tsv file into R, Python, or your preferred statistical environment. The table has samples as rows and microbial features (taxa, pathways, etc.) as columns.Recode each feature column to binary
For each feature (column), apply the following rule:
- Assign 1 if the value is > 0 (present).
- Assign 0 if the value is = 0 (absent).
Apply a minimum prevalence filter
Before fitting logistic models, exclude features that are extremely rare. See the section below on choosing a minimum prevalence threshold.
Choosing a Minimum Prevalence Threshold
Running logistic regression on extremely rare features — those present in only a few samples — leads to unstable model estimates, separation issues, and uninterpretable results. It is therefore recommended to apply a minimum prevalence filter to your feature set before modeling.How to calculate prevalence
For each feature, compute its prevalence as: Prevalence (%) = [Number of samples where feature is present (> 0)]/[Total number of samples]]Recommended thresholds
| Study size | Suggested minimum prevalence |
|---|---|
| Large cohort (n > 100) | 5% of samples |
| Medium cohort (n = 50–100) | 10% of samples |
| Small cohort (n < 50) | 10–20% of samples |
Example Methods Text
Use the following template for your manuscript or internal SOP methods section, adapted to your study:“For secondary prevalence analyses, the MaAsLin3 input abundance matrix was exported from Cosmos‑Hub (www.cosmos-hub.com, Cmbio, Germantown, MD). Each microbial feature was coded as present (1) when its input abundance was non‑zero and absent (0) otherwise, consistent with the binary presence/absence framework used by MaAsLin3 for logistic prevalence modeling [cite MaAsLin3 paper]. Features with prevalence below [X]% of samples were excluded prior to logistic regression to reduce model instability driven by extremely rare taxa.”Replace [X]% with your chosen minimum prevalence threshold (e.g., 5%, 10%).
Frequently Asked Questions
Should I use the Input Data or the Association Results file to build my prevalence matrix?
Should I use the Input Data or the Association Results file to build my prevalence matrix?
Use the Input Data file. This contains the normalized abundance matrix that MaAsLin3 used as input, from which you can derive binary presence/absence values. The Association Results file contains model outputs (coefficients, p‑values), not the raw feature table.
Does MaAsLin3 recommend a specific presence threshold higher than >0?
Does MaAsLin3 recommend a specific presence threshold higher than >0?
No. The MaAsLin3 paper and Bioconductor documentation do not recommend any threshold other than non-zero for defining presence. Sparsity is controlled via pre-model filtering parameters (minimum prevalence, minimum abundance), not by adjusting the presence definition.
Can I use a relative abundance cutoff (e.g., >0.01%) to define presence?
Can I use a relative abundance cutoff (e.g., >0.01%) to define presence?
You can, but only if justified by a study-specific rationale such as a known LOD. It is not consistent with how MaAsLin3 internally defines prevalence, and it may reduce comparability with your MaAsLin3 results.
What filtering does MaAsLin3 apply before splitting the data into prevalence and abundance?
What filtering does MaAsLin3 apply before splitting the data into prevalence and abundance?
By default, MaAsLin3 applies minimum prevalence, minimum abundance, and minimum variance filters to remove uninformative features. The exact parameters used in your Cosmos-Hub run are reflected in the filtered
Input Data export, so any features that were excluded by MaAsLin3 prior to modeling will not appear in the export.Reference Links
MaAsLin3 in Cosmos‑Hub
Overview and setup guide for running MaAsLin3 in the Cosmos‑Hub statistics toolbox.
Viewing MaAsLin3 Results
Guide to interpreting MaAsLin3 outputs and exporting Input Data, Input Metadata, and Association Results.
MaAsLin3 Paper (Nature Methods)
Nearing et al. (2025). MaAsLin 3: refining and extending generalized multivariable linear models for meta‑omic association discovery.
MaAsLin3 on PubMed / PMC
Open-access version of the MaAsLin3 methods paper via PubMed Central.
MaAsLin3 User Manual
Full Bioconductor vignette covering MaAsLin3 parameters, normalization, filtering, and model types.
MaAsLin3 Tutorial
Step-by-step MaAsLin3 tutorial from the Huttenhower Lab covering input formats, model options, and outputs.
Create Comparative Analysis
How to create and configure a comparative analysis in Cosmos‑Hub for running MaAsLin3 and other statistics.
MaAsLin2 Workflow & Filtering
Filtering guidance for MaAsLin-style workflows, including minimum prevalence thresholds applicable to downstream MaAsLin3 analyses.