Soil Metal Concentrations Drive Functional Gene Shifts in the Environmental Microbiome
CompletedResearch Question
Which microbial functional gene categories (COGs) are significantly associated with soil
metal concentrations at global scale, and do these associations reflect metabolic adaptation,
resistance, or passive environmental filtering?
Key Findings
- 2,355 significant COG-metal associations (FDR < 0.05) across 9 metals (Cu, Co,
Cr, Ni, Zn, Pb, As, Cd, Hg) in 51,748 soil samples. Chromium and Lead drive the
strongest signals; transporters (ABC, RND) and biosynthesis genes dominate top hits. - Copper-specific analysis (116 COGs, FDR < 0.05; 7,566 samples with nearby KBase
genomes at 10 km): top positives include cell division and nucleotide transport (BQ,
FQ, FK, CF); top negatives include energy production (DI, CE), suggesting energetic
trade-offs under copper stress. - db-RDA: R² = 0.799, p = 0.005 (permutation test, 999 permutations) — metal
concentrations explain 80% of variance in community COG profiles after conditioning
on batch/project effects. - Biome-stratified PGLS: soil, marine, and wastewater show distinct metal-COG
relationships, indicating environment-specific functional responses rather than a
universal resistance programme.
Interpretation
Soil metal concentrations are strongly associated with the functional gene content of
co-located microbial communities (db-RDA R²=0.799), and this relationship is not a
statistical artefact of batch effects (project accession conditioned out). The direction
of COG associations with copper (energetic trade-offs, membrane transport enrichment)
is consistent with known copper toxicity mechanisms in bacteria.
However, all associations are observational. Co-contamination by multiple metals is a
significant confound: Cr, Cu, Pb, Zn co-vary in many industrial soils. Whether
individual COG-metal associations are metal-specific or reflect a generic stress
response to multi-metal contamination is unresolved (NB05 partial correlation models).
Effect sizes (Spearman ρ) have not yet been systematically reported across all 2,355
associations; many may be statistically significant but biologically small.
Status
Preliminary — Spearman correlation, db-RDA, and PGLS analyses complete (NB01–NB04).
NB05 effect size and spatial validation pending. NB06 figures pending.
Critical Assessment
R² = 0.799 requires careful interpretation. db-RDA conditions on project accession
before fitting the metal concentration predictors. If batch effects (project) account
for the majority of community COG variance, the reported R² describes how well metals
explain the residual variance, not the total variance. The unconditional R² of metals
alone — without first removing project effects — is not reported and may be substantially
lower. This must be reported alongside the conditional figure.
2,355 significant associations at FDR < 0.05 in 51,748 samples: With 9 metals ×
435 COGs = 3,915 tests, a 60% discovery rate is high. Metals co-contaminate (Cr, Pb,
Zn, Cu co-vary in industrial soils), so tests are not independent — Benjamini-Hochberg
FDR correction assumes independent tests and will be anti-conservative under positive
correlation. True FDR is likely higher than reported. The partial correlation model
(NB05 item 3) is the correct remedy.
Copper-specific analysis uses a 10 km proximity criterion to match soil samples to
KBase genomes. At 10 km, many "nearby" genomes may not be co-located with the copper
measurements. Sensitivity analysis at 5 km and 20 km thresholds is needed before
COG-copper associations can be attributed to the measured sites.
Spark-required analyses: NB05 items 1–4 all require re-running from the
kescience_mgnify/kbase_ke_pangenome Spark tables. No local CSV output of the
Spearman ρ values or model residuals is available, so these validations cannot be
completed without Spark access.
Pending Validation
- Effect size audit: ρ distribution across 2,355 associations; flag ρ < 0.05
- Spatial autocorrelation test (Moran's I on residuals; SEVM if significant)
- Partial correlation model: COG ~ Cr | Cu + Zn + Pb (metal-specific vs. general)
- Mechanistic classification of significant COGs into resistance, stress, membrane,
energy, and unknown categories - Report unconditional db-RDA R² (metals only, without project accession conditioning)
- Copper proximity sensitivity: rerun at 5 km and 20 km thresholds
Data Collections
Atlas Reuse
Derived products, review objects, and tensions connected to this project in the BERIL Atlas.
Metal specificity versus general stress
Metal fitness hits include both specific metal biology and broad stress response, so engineering targets need specificity filters and counter-ion controls.
unresolvedMetal-AMR co-selection readiness
BERIL has the pieces to test metal-AMR co-selection, but the current evidence is a strong opportunity rather than a resolved result.