Running MaAsLin 2 Workflow

We recommend running MaAsLin analyses independent of other comparative analyses. Once the parameters are set, click the Create button to start the MaAsLin2 workflow. CosmosID-HUB will process the data and return the results when the analysis is complete. Depending on the size of the dataset, the analysis may take a few minutes to 1+ hour.

🚧

A note about running MaAsLin2:

Users are responsible for selecting parameters that best suit their study data and research objectives. While we provide basic recommendations, it is crucial to have a thorough understanding of the MaAsLin2 tool and the statistical methods employed to ensure accurate and reproducible results. We strongly encourage users to familiarize themselves with the tool’s capabilities and limitations to make informed decisions regarding their analysis settings.


Input Data Preparation

Before running MaAsLin2, you will need to process your .fastq files through one of the taxonomic or functional workflows available within the CosmosID-HUB. You will also need to add metadata to each of your samples to define cohorts needed for comparative analysis. Ensure your metadata encompasses all variables that you want to measure and that may affect your results, with clear groupings for main comparative variables.



Initiating MaAsLin2 Comparative

After profiling, select the samples that you would like to compare in the "Cohorts and Metadata" menu and click "Create CA" to initiate the configuration of your parameters.





Selecting the MaAsLin2 Comparative Workflow

Select one of the following workflows depending on your input data:




  • MaAsLin2 KEPLER-Taxa - conduct differential abundance analysis on Taxa-Kepler results for host agnostic profiling
  • MaAsLin2 CHAMP-Taxa - conduct differential abundance analysis on Taxa-CHAMP results for human-based profiling
  • MaAsLin2 HostAgnostic Functional - conduct differential abundance analysis on Functional 2.0 (MetaCyc, GO Terms, Pfam, CAZy, EnzymeCommission) profiling results
  • MaAsLin2 CHAMP-Functional - conduct differential abundance analysis on GBM/GMM/KEGG results

Recommended Workflow Parameters

This section outlines suggested settings for MaAsLin2 to optimize differential abundance analysis. Proper configuration ensures accurate results and helps address specific research questions effectively. Use these guidelines to select appropriate fixed and random effects, normalization techniques, and statistical models tailored to your study design. Adjusting these parameters based on your dataset’s characteristics and research objectives will enhance the reliability and interpretability of your findings.

KEPLER-TaxaHostAgnostic FunctionalCHAMP-TaxaCHAMP-Functional
MetricRelative AbundanceCPMRelative AbundanceCellular Abundance
Taxonomic LevelSpeciesNASpeciesNA
Min. Abundance*0.00110000.0010.01
Min. Prevalence**0.050.750.050.75
Q Threshold0.010.010.010.01
NormalizationNONETSSNONENONE
TransformationLOGLOGLOGLOG
Analysis MethodLMLMLMLM
Multiple CorrectionBHBHBHBH
Standardization
N Heatmap100100100100

*A single hit can always be a sequencing error, contamination error, or other noise variables etc. Implementing a minimum abundance with minimum prevalence cutoff can ensure more accurate multi-variate association analysis data.

**Samples are far richer in functional hits than in taxonomic hits. Non-prevalent functions can sharply increase computational time and may include false positives, which is why more stringent cutoffs are recommended.


Selecting Fixed and Random Effects

Choosing appropriate fixed and random effects is critical for the success of MaAsLin2 analyses. These variables define how samples are grouped into comparative cohorts, directly impacting the results. Fixed effects are the primary variables of interest, such as treatment groups or time points, while random effects account for potential confounding factors, like age or sample ID. Properly defining these effects ensures that your analysis accurately reflects the biological questions you are exploring and minimizes biases in the model.

Fixed Effects

Fixed effects are metadata variables that relates to your hypothesis. These are the primary variables that you are interested in obtaining specific comparative results against. A fixed effect must have 2 or more groups for comparison (we recommend a max of 5 groups).

For example, a fixed effect could be:

  • "Condition" [Control vs. Treatment]
  • "DiseaseState" [Healthy vs. IBD vs. Crohn's]
  • "Time Point" [baseline vs. mid-study vs. endpoint]
  • Drug Dosage [0ng vs. 50ng vs. 100ng]

Multiple fixed effects can be selected for comparisons across multiple metadata variables. Results will be presented in the analysis results for every variable/level.

Random Effects

Random effects are metadata variables that may not be interested in evaluating but may present variability in the study data. Use these to account for confounding variables that might introduce noise, such as subject-specific differences (e.g., BMI, age). This helps in minimizing bias from uncontrolled factors. Only include relevant confounders to prevent model overfitting.

For example, a random effect could be:

  • BMI
  • Age
  • Cage number
  • Days of antibiotic administration
  • SampleID (for longitudinal samplings from the same individual/host)
  • Experimental blocks

📘

Selecting Effect Variable Parameters

When defining fixed and random effects, it’s essential to select appropriate data types and reference values.

Fixed effects should include categorical variables with clear groups or continuous variables. For categorical data, specify a reference value (if applicable) to serve as a baseline for comparison (e.g., day0, control, healthy). Should a categorical data type be selected without a reference value, MaAsLin2 by default sets the first category in alphabetical order as the reference.

Random effects account for variability due to confounding factors (e.g., subject ID in longitudinal studies, age, cage number). Correctly setting these parameters ensures robust statistical modeling and accurate interpretation of results.


Selecting Other Model Parameters

Filterset

Filtered results contain the calls that have met the machine learning threshold for high confidence that the organism called is actually in the sample.

Total results contain all calls, providing a more comprehensive but potentially noisier dataset. Calls that fall below the filtering threshold and may need further validation to determine if they are truly present in the sample.


Metric

Options for taxonomic data:

  • Relative abundance: percentage abundance per sample
  • Normalized reads frequency: counts normalized to the size of their matching genome; not synonymous with the "Normalization" parameter
  • Reads frequency: raw read counts

Options for functional data:

  • CPM (Kepler): copies per million (normalized counts for functional data)
  • Relative abundance (Kepler): percentage abundance per sample
  • Cellular abundance (CHAMP): represents the relative percentage of species that are able to perform a given function in the microbiome community

Taxonomy Level

Options include: Phylum, Class, Order, Family, Genus, Species, Strain

Select the desired phylogenetic level of calls for differential abundance analysis.

Recommended: Species


Minimum Abundance

The minimum abundance threshold for each feature.

Set thresholds to filter low-abundance features, reducing noise and focusing on biologically relevant data.


Minimum Prevalence Percentage

The minimum percent of samples for which a feature is detected at minimum abundance.

This parameter helps filter out rare features that may not provide meaningful insights by defining the minimum proportion of samples in which a feature must be present to be included in the analysis. Selecting an appropriate threshold reduces noise and focuses on features that are consistently observed across the study cohort. Typically, a threshold of 5% is recommended, but this can vary based on the study design and specific research questions.

📘

Prevalence and abundance filtering in MaAsLin2

Typically, it only makes sense to test for feature-metadata associations if a feature is non-zero "enough" of the time. "Enough" can vary between studies, but a 10-50% minimum prevalence threshold is not unusual (and up to 70-90% can be reasonable). Selecting a minimum prevalence filter of 5% will test only features with at least 5% non-zero values.

Similarly, it's often desirable to test only features that reach a minimum abundance threshold in at least this many samples. By default, MaAsLin2 will consider any non-zero value to be reliable, and if you've already done sufficient QC in your dataset, this is appropriate. However, if you'd like to filter more or be conservative, you can set a minimum abundance threshold like min_abundance = 0.001 to test only features reaching at least this (relative) abundance.


Q Value Significance Threshold

The maximum q-value threshold the be considered significant.

The Q value threshold controls the false discovery rate, helping to reduce false positives in multiple comparisons. A typical threshold is ≤ 0.1, but this can vary depending on the dataset (0.05-0.25). Adjust based on your study’s balance between identifying true associations and minimizing errors.


Normalization

Different normalization techniques adjust for varying sequencing depths or compositional biases:

  • TSS (Total Sum Scaling): (default) normalizes the feature table by dividing each feature by the total sum of features per sample.
  • TMM (Trimmed Mean of M-values): assumes that most taxa are not differentially expressed between samples; computationally intensive for large data sets
  • CLR (Centered Log Ratio): Useful for compositional data, particularly when the data is sparse.
  • CSS (Cumulative Sum Scaling): Suited for sparse count data, normalizing features based on cumulative sums.

Transformation

Choose transformations depending on the data distribution:

  • Log: (default) Apply for skewed data, typically in microbial abundance
  • Log10: A base-10 log transformation, less aggressive than natural log
  • Logit: For binary data (e.g., presence/absence).
  • None: Use with caution; generally reserved for count models like NEGBIN or ZINB.

Analysis Method (Statistical Model)

Several different types of statistical models can be used for association testing in MaAsLin. The pros and cons of each model is discussed in the MaAsLin2 manuscript. While the default is generally appropriate for most analyses, a user might want to select a different model under some circumstances. This is true for many different microbial community data types (taxonomy or functional profiles), environments (human or otherwise), and measurements (counts or relative proportions), as long as alternative models are used with appropriately modified normalization/transformation schemes.

For non-count (relative abundance, CPM) data: use LM or CPLM.

For count data (reads frequency/normalized reads frequency):, use NEGBIN and ZINB

Options include:

  • LM (Linear Model): A traditional linear regression model suitable for both positive and negative values. It assumes normally distributed errors and a linear relationship between dependent and independent variables. Ideal for data with no zeros or minimal skewness, such as relative abundance data that has been transformed (e.g., log-transformed).
  • CPLM (Compound Poisson Linear Models): Models continuous data with exact zeros using the Tweedie distribution, which combines a point mass at zero with a continuous positive distribution. Ideal for zero-inflated continuous data, such as sparse microbiome functional profiles where many samples have zero abundance for specific features.
    • CPLM requires the data to be positive. Therefore, any transformation that produces negative values will typically NOT work for CPLM.
  • NEGBIN (Tweedie compound Poisson linear models): A count-based model that handles overdispersion, where variance exceeds the mean. It includes an extra parameter to model this variability. Suitable for count data with high variability and zero counts, such as raw microbial read counts.
  • ZINB (Zero-inflated negative binomial regression): Extends the NEGBIN model by incorporating an additional component to model excess zeros, accounting for true and false zeros separately. Ideal for highly sparse count data where zero inflation is prominent, such as environmental or clinical microbiome datasets with many undetected features.

👍

If you're not sure which to use, select LM

LM is the only model that works on both positive and negative values (following normalization and transformation), and (as per the manuscript), it is generally much more robust to parameter changes (which are typically limited for non-LM models). Regarding whether to use LM, CPLM, or other models, intuitively, CPLM or a zero-inflated alternative should perform better in the presence of zeroes, but based on tool benchmarking, there is no evidence that CPLM is significantly better than LM in practice.

All the non-LM models use an intrinsic log link transformation due to their close connection to GLMs and they are recommended to be run with no transformation.

ModelData TypeNormalizationTransformation
LMcount and non-countTSS,CLR, NONELOG, LOGIT, AST, NONE
CPLMcount and non-countTSS, TMM, CSS, NONENONE
NEGBINcountTMM, CSS, NONENONE
ZINBcountTMM, CSS, NONENONE

Multiple Hypothesis Correction Method

Correction for multiple testing is a statistical method used to reduce the likelihood of false positive results when performing multiple comparisons or tests. This produces a q value, which complements the corresponding p value. When many statistical tests are conducted simultaneously, the chance of finding a significant result purely by chance increases. To address this, correction methods adjust the significance levels to control the overall error rate.

Options include:

  • BH (Benjamini-Hochberg FDR): (default) This method controls the False Discovery Rate (FDR), which is the expected proportion of false positives among the declared significant results. The p-values are ranked, and a threshold is determined based on the desired FDR.
    • Strengths/Weaknesses: Less conservative than Bonferroni, so it is more likely to detect true positives, making it useful in large datasets. While it reduces the chance of false positives, some false discoveries may still occur.
  • Bonferroni Correction: This is the simplest and most conservative method ideal for small datasets. The p-value threshold is adjusted by dividing the original significance level (α) by the number of comparisons (m). For example, if you are testing 100 hypotheses with a significance level of 0.05, the adjusted p-value threshold would be 0.05/100 = 0.0005.
    • Strengths/Weaknesses: Controls the family-wise error rate (FWER), meaning it reduces the risk of any false positives. However, it can be overly conservative, especially when there are many comparisons, leading to false negatives (missing true associations).
  • Holm: A stepwise version of the Bonferroni method. It adjusts the p-value thresholds sequentially, beginning with the smallest p-value. Each p-value is compared to a threshold based on its rank (smallest p-value gets divided by the number of tests, second smallest by one less, and so on).
    • Strengths/Weaknesses: More powerful than the Bonferroni correction, especially in situations with a smaller number of comparisons. However, it is still fairly conservative, which can reduce power.
  • None: No correction applied.

Standardize Numeric/Continuous Metadata Variable

Apply z-score standardization so continuous metadata are on the same scale.

Standardizing a numeric or continuous metadata variable involves transforming the values to have a mean of zero and a standard deviation of one. This process, also known as z-score normalization, allows variables with different units or scales to be compared directly. Standardization is crucial when variables differ significantly in their ranges or units, ensuring that each variable contributes equally to the model and preventing any one variable from disproportionately influencing the analysis. It is commonly used in linear models and clustering algorithms.

📘

Why standardize your numeric variables?

Suppose you have a microbiome dataset where you want to analyze the impact of participants' age and BMI on microbial abundance. Since age and BMI have different scales, standardizing these variables (e.g., converting age from years and BMI from kg/m² to z-scores) ensures that both contribute equally to the analysis. This prevents the model from being disproportionately influenced by the variable with the larger numerical range, allowing for more balanced and interpretable results.


Plot Heatmap

Generate a heatmap for the significant associations.


Number of Significance Features in Heatmap

In heatmap, plot top N features with significant associations.


Plot Scatter

Generate scatter plots for the significant associations.


Workflow

Select the version of your taxonomic/functional workflows.