Building Custom Prevalence Models

MaAsLin3 in Cosmos‑Hub simultaneously models both abundance and prevalence associations from the same microbiome feature table, making it one of the most comprehensive tools available for differential microbiome analysis. This guide explains how MaAsLin3 defines prevalence internally, how to export the correct data from Cosmos‑Hub, and how to use that data to build custom logistic regression models for secondary prevalence analyses.

This guide is intended for researchers who have already run MaAsLin3 in Cosmos‑Hub and want to extend their prevalence analysis using external tools such as R or Python. If you have not yet run MaAsLin3, start with the MaAsLin3 overview.

How MaAsLin3 Models Prevalence

MaAsLin3 is a generalized multivariable modeling framework designed to identify microbial associations in complex, high-dimensional datasets. Unlike approaches that model only abundance, MaAsLin3 captures two complementary biological signals from the same input feature table:

Prevalence – how often a feature is detected across samples, modeled via logistic regression on a binary presence/absence profile.
Abundance – how much of a feature is present when detected, modeled via log‑linear regression on non‑zero abundances.

As described in the MaAsLin3 paper (Nature Methods, 2025):

“MaAsLin 3 takes as input a table of microbial community feature abundances and metadata. These feature data are normalized, filtered, split into prevalence (present versus absent) and log‑transformed nonzero abundances, and fit with a modified logistic model and a linear model, respectively.”

When MaAsLin3 runs in Cosmos‑Hub, the workflow proceeds as follows:

The input feature abundance table (taxa, pathways, or functional features) is normalized — by default using total‑sum scaling to relative abundances.
Optional filtering is applied to remove extremely rare or low-variance features based on minimum prevalence, minimum abundance, and minimum variance thresholds.
The filtered table is split into:
- A binary prevalence profile (present = 1, absent = 0) for logistic regression.
- A non‑zero abundance subset (log‑transformed) for linear regression.

This means that the same abundance values in the Input Data export are the source of both the prevalence and abundance models.

Defining Presence for Custom Logistic Regression

A common question when extending MaAsLin3 results is: what threshold should I use to define “presence”? The answer is straightforward. MaAsLin3 defines presence as any non‑zero value in the normalized feature table. There is no alternative recommended cutoff (such as >1e‑4 or >1% relative abundance) in the MaAsLin3 documentation or paper. Sparsity is managed through feature filtering parameters applied before the model, not by changing the definition of presence itself.

Recommended binary coding

Value in Input Data	Prevalence Code	Interpretation
> 0	1	Present
= 0	0	Absent

Using this coding ensures your external logistic regression models are conceptually consistent with the prevalence associations reported by MaAsLin3 in Cosmos‑Hub.

If your study has a validated limit of detection (LOD) — for example, from spike‑in calibration or qPCR — you may choose to define presence as “above LOD” rather than strictly ”> 0.” This is a valid study‑specific decision but is not required by MaAsLin3 itself.

Exporting MaAsLin3 Input Data from Cosmos‑Hub

To build a presence/absence matrix that mirrors MaAsLin3, start from the exact feature table used in the Hub run.

Open your MaAsLin3 comparative analysis

Navigate to your MaAsLin3 run from the Comparative Analysis dashboard in Cosmos‑Hub.

Click Export

Click the Export button in the top‑right of the analysis view. Cosmos‑Hub will generate and download a ZIP archive containing all output files for your MaAsLin3 run.

Locate the three key files

Inside the ZIP, you will find:

Input Data — A .tsv abundance matrix used as MaAsLin3’s feature input (samples as rows, features as columns). This is the primary file for building presence/absence variables.
Input Metadata — A .tsv file with all metadata variables used as covariates and outcomes in the model.
Association Results — The full MaAsLin3 output table with model type (abundance vs prevalence), beta coefficients, p‑values, and q‑values (FDR).

The official documentation for MaAsLin3 results and exports is available here:
https://docs.cosmosid.com/docs/maaslin3-view-results

Building a Presence/Absence Matrix

Once you have exported the Input Data file, follow these steps to create a binary presence/absence matrix for logistic regression.

Load the Input Data table

Import the Input Data .tsv file into R, Python, or your preferred statistical environment. The table has samples as rows and microbial features (taxa, pathways, etc.) as columns.

Recode each feature column to binary

For each feature (column), apply the following rule:

Assign 1 if the value is > 0 (present).
Assign 0 if the value is = 0 (absent).

Apply a minimum prevalence filter

Before fitting logistic models, exclude features that are extremely rare. See the section below on choosing a minimum prevalence threshold.

Run logistic regression

Use the binary presence/absence matrix as the response variable and the Input Metadata variables as predictors.

Because you are using the same feature table that MaAsLin3 used, your custom prevalence models will be directly comparable to the “Prevalence” associations shown in the MaAsLin3 Association Results tab and export files.

Choosing a Minimum Prevalence Threshold

Running logistic regression on extremely rare features — those present in only a few samples — leads to unstable model estimates, separation issues, and uninterpretable results. It is therefore recommended to apply a minimum prevalence filter to your feature set before modeling.

How to calculate prevalence

For each feature, compute its prevalence as: Prevalence (%) = [Number of samples where feature is present (> 0)]/[Total number of samples]]

Recommended thresholds

Study size	Suggested minimum prevalence
Large cohort (n > 100)	5% of samples
Medium cohort (n = 50–100)	10% of samples
Small cohort (n < 50)	10–20% of samples

These thresholds are informed by common MaAsLin‑style workflow guidance, which recommends removing low-prevalence features prior to association testing to improve model stability and interpretability. Reference: Running MaAsLin2 Workflow – Cosmos‑Hub (prevalence filtering guidance is equally applicable to MaAsLin3 downstream analyses).

Setting your minimum prevalence threshold too low (e.g., 1% of samples in a small cohort) may introduce unstable logistic models with inflated or non-convergent estimates. Always check that your included features have sufficient “events” (presence observations) to support the number of predictors in your model.

Example Methods Text

Use the following template for your manuscript or internal SOP methods section, adapted to your study:

“For secondary prevalence analyses, the MaAsLin3 input abundance matrix was exported from Cosmos‑Hub (www.cosmos-hub.com, Cmbio, Germantown, MD). Each microbial feature was coded as present (1) when its input abundance was non‑zero and absent (0) otherwise, consistent with the binary presence/absence framework used by MaAsLin3 for logistic prevalence modeling [cite MaAsLin3 paper]. Features with prevalence below [X]% of samples were excluded prior to logistic regression to reduce model instability driven by extremely rare taxa.”

Replace [X]% with your chosen minimum prevalence threshold (e.g., 5%, 10%).

Frequently Asked Questions

Should I use the Input Data or the Association Results file to build my prevalence matrix?

Use the Input Data file. This contains the normalized abundance matrix that MaAsLin3 used as input, from which you can derive binary presence/absence values. The Association Results file contains model outputs (coefficients, p‑values), not the raw feature table.

Does MaAsLin3 recommend a specific presence threshold higher than >0?

Can I use a relative abundance cutoff (e.g., >0.01%) to define presence?

You can, but only if justified by a study-specific rationale such as a known LOD. It is not consistent with how MaAsLin3 internally defines prevalence, and it may reduce comparability with your MaAsLin3 results.

What filtering does MaAsLin3 apply before splitting the data into prevalence and abundance?

By default, MaAsLin3 applies minimum prevalence, minimum abundance, and minimum variance filters to remove uninformative features. The exact parameters used in your Cosmos-Hub run are reflected in the filtered Input Data export, so any features that were excluded by MaAsLin3 prior to modeling will not appear in the export.

Reference Links

MaAsLin3 in Cosmos‑Hub

Overview and setup guide for running MaAsLin3 in the Cosmos‑Hub statistics toolbox.

Viewing MaAsLin3 Results

Guide to interpreting MaAsLin3 outputs and exporting Input Data, Input Metadata, and Association Results.

MaAsLin3 Paper (Nature Methods)

Nearing et al. (2025). MaAsLin 3: refining and extending generalized multivariable linear models for meta‑omic association discovery.

MaAsLin3 on PubMed / PMC

Open-access version of the MaAsLin3 methods paper via PubMed Central.

MaAsLin3 User Manual

Full Bioconductor vignette covering MaAsLin3 parameters, normalization, filtering, and model types.

MaAsLin3 Tutorial

Step-by-step MaAsLin3 tutorial from the Huttenhower Lab covering input formats, model options, and outputs.

Create Comparative Analysis

How to create and configure a comparative analysis in Cosmos‑Hub for running MaAsLin3 and other statistics.

MaAsLin2 Workflow & Filtering

Filtering guidance for MaAsLin-style workflows, including minimum prevalence thresholds applicable to downstream MaAsLin3 analyses.

​Building Custom Prevalence Models

​How MaAsLin3 Models Prevalence

​Defining Presence for Custom Logistic Regression

​Recommended binary coding

​Exporting MaAsLin3 Input Data from Cosmos‑Hub

​Building a Presence/Absence Matrix

​Choosing a Minimum Prevalence Threshold

​How to calculate prevalence

​Recommended thresholds

​Example Methods Text

​Frequently Asked Questions

​Reference Links

MaAsLin3 in Cosmos‑Hub

Viewing MaAsLin3 Results

MaAsLin3 Paper (Nature Methods)

MaAsLin3 on PubMed / PMC

MaAsLin3 User Manual

MaAsLin3 Tutorial

Create Comparative Analysis

MaAsLin2 Workflow & Filtering

Building Custom Prevalence Models

How MaAsLin3 Models Prevalence

Defining Presence for Custom Logistic Regression

Recommended binary coding

Exporting MaAsLin3 Input Data from Cosmos‑Hub

Building a Presence/Absence Matrix

Choosing a Minimum Prevalence Threshold

How to calculate prevalence

Recommended thresholds

Example Methods Text

Frequently Asked Questions

Reference Links