A Python package for analyzing nasal microbiome data from MetaPhlAn outputs, specifically designed for clinical studies with time-series data and clinical variables.
- Import and parse MetaPhlAn output files
- Combine multiple samples into a unified abundance table
- Filter to species-level taxonomic data
- Join with clinical metadata
- Perform diversity analyses (alpha and beta diversity)
- Identify differentially abundant species between clinical groups
- Analyze longitudinal changes in microbiome composition
- Generate publication-quality visualizations
- Command-line interface for easy usage
# Clone the repository
git clone https://github.com/yourusername/metaphlan_tools.git
cd metaphlan_tools
# Install the package
pip install -e .
The package provides a command-line interface for common analysis tasks:
metaphlan_tools process --input-dir /path/to/metaphlan/files --output-dir /path/to/output
metaphlan_tools diversity --metadata-file metadata.csv --output-dir /path/to/output --group-var Severity
metaphlan_tools differential --metadata-file metadata.csv --output-dir /path/to/output --group-var Symptoms
metaphlan_tools longitudinal --metadata-file metadata.csv --output-dir /path/to/output --time-var Timing --subject-var SubjectID --group-var Severity
metaphlan_tools report --metadata-file metadata.csv --output-dir /path/to/output --group-var Severity
You can also use the package directly in your Python scripts:
import pandas as pd
from metaphlan_tools import parse_metaphlan_file, combine_samples, load_metadata
from metaphlan_tools import calculate_alpha_diversity, differential_abundance_analysis
from metaphlan_tools import plot_relative_abundance_heatmap
# Process files
files = ['sample1.txt', 'sample2.txt', 'sample3.txt']
abundance_df = combine_samples(files)
# Load metadata
metadata_df = load_metadata('metadata.csv')
# Calculate alpha diversity
alpha_df = calculate_alpha_diversity(abundance_df)
# Find differentially abundant species
diff_results = differential_abundance_analysis(abundance_df, metadata_df, 'Severity')
# Create visualization
fig = plot_relative_abundance_heatmap(abundance_df, metadata_df, 'Severity')
fig.savefig('heatmap.png')
The package expects standard MetaPhlAn output files, which are tab-delimited with taxonomic classifications in the first column and relative abundances in subsequent columns.
The metadata should be in CSV or Excel format with:
SampleID
column matching the sample IDs in MetaPhlAn files- Clinical variables including
Timing
,Severity
,Symptoms
, andSubjectID
- Additional metadata can be included and used in analyses
Example metadata structure:
SampleID,SubjectID,Timing,Severity,Symptoms,Age
S001,P1,Prior,0,Asymptomatic,5
S002,P1,Acute,1,Mild,5
S003,P1,Post,0,Asymptomatic,5
S004,P2,Prior,0,Asymptomatic,7
S005,P2,Acute,2,Severe,7
S006,P2,Post,0,Asymptomatic,7
A typical analysis workflow would consist of:
- Process all MetaPhlAn files into a combined abundance table
- Join with metadata for sample information
- Calculate alpha diversity and compare between clinical groups
- Perform beta diversity analysis to examine community-level differences
- Identify specific species that differ between clinical groups
- Analyze longitudinal changes across time points
- Generate visualizations and reports
- Python 3.8 or higher
- pandas
- numpy
- matplotlib
- seaborn
- scipy
- scikit-bio
- statsmodels
- scikit-learn
- networkx
This project is licensed under the MIT License - see the LICENSE file for details.