Paramvir S. Dehal ORCID

Research Question

Is the genetic architecture of metal cross-resistance conserved across phylogenetically diverse bacteria, or is it rewired species by species?

Research Plan

Hypothesis

  • H0: Metal cross-resistance patterns are organism-specific — different species use different genes for the same metal pair, and cross-resistance matrices are uncorrelated across organisms.
  • H1: Metal-metal fitness correlations are conserved across organisms (same metal pairs cluster in >70% of species). Cross-resistance reflects metal chemistry, not organism-specific wiring.
  • H2: Cross-resistance (shared) genes are more core in the pangenome than metal-specific genes, extending the 87.4% core enrichment finding (Metal Fitness Atlas) with a finer gradient: general stress > metal-shared > metal-specific.
  • H3: Predicted multi-metal tolerance (from conserved cross-resistance gene signatures) correlates with polymetallic isolation environments in BacDive, extending the single-metal validation (Cohen's d=+1.0) to multi-metal predictions.

Revision History

  • v1 (2026-04-27): Initial plan

Overview

The literature treats metal cross-resistance as a binary property of strains (e.g., "Co-Ni cross-resistant"), inferred from MIC assays on a handful of model organisms. We have genome-wide RB-TnSeq fitness data across 28 organisms and up to 14 metals — enabling the first gene-resolution cross-resistance analysis. For each organism, we correlate gene fitness profiles between all metal pairs to build a metal × metal cross-resistance matrix, then test whether these matrices are conserved across organisms (universal chemistry) or rewired per species (organism-specific). We decompose genes into a three-tier architecture (general stress > metal-shared > metal-specific) and predict that shared genes are more core in the pangenome. Finally, we use conserved cross-resistance signatures to predict multi-metal tolerance across 27K species and validate against BacDive isolation environments.

Key Findings

1. Metal cross-resistance is universal and directionally conserved (H1 strongly supported)

DvH cross-resistance matrix

Across 317 organism-metal pair observations (28 organisms, 85 unique metal pairs), 98.1% of gene-level fitness correlations are positive (311/317) and 99.1% are statistically significant (p < 0.05). All 15 metal pairs tested in ≥5 organisms show >90% sign consistency — no metal pair has systematically negative cross-resistance in any organism.

The consensus cross-resistance matrix (mean Pearson r across organisms) reveals biologically meaningful metal clusters:

Consensus clustering

  • Ni-Co (r = 0.56, n = 28 organisms): The classic divalent cation cross-resistance pair, now validated at gene resolution across diverse phyla
  • Fe-Zn (r = 0.61, n = 6): Unexpectedly strong; likely reflects shared disruption of iron-sulfur cluster proteins
  • Cu-U (r = 0.51, n = 5): Both cause membrane/oxidative damage
  • Al is the most independent metal (mean r = 0.34): Consistent with its unique trivalent toxicity mechanism

The direction of cross-resistance (all positive) is universal; the quantitative magnitude is moderately conserved across organisms (leave-one-out consensus prediction r = 0.41, Mantel mean r = 0.23).

(Notebook: 01_metal_experiment_inventory.ipynb, 02_cross_resistance_matrices.ipynb)

2. Cross-resistance patterns are consistent across phylogenetically diverse organisms

Conservation across organisms

The boxplot shows that for each metal pair, the distribution of Pearson r values across organisms is consistently positive, with tight interquartile ranges. Co-Ni (median r ≈ 0.58) is the strongest pair across 28 organisms spanning Proteobacteria, Bacteroidetes, Firmicutes, and Actinobacteria. Al-Co and Al-Ni are the weakest pairs (median r ≈ 0.30), consistent with aluminum's distinct toxicity mechanism.

Multi-organism panel

Individual organism heatmaps show the same qualitative structure: warm colors throughout (positive cross-resistance), with the Ni-Co/Co-Zn block consistently among the strongest. DvH (13 metals) provides the richest single-organism view, with Mo standing out as the most independent metal.

Inter-organism agreement

Pairwise Spearman correlations between organism cross-resistance rankings (on the 10 common metal pairs across 12 organisms with ≥5 metals) show mostly positive agreement, confirming that organisms rank metal pairs similarly.

(Notebook: 02_cross_resistance_matrices.ipynb, 03_cross_resistance_conservation.ipynb)

3. Conservation is validated by Mantel tests and LOO prediction, but the permutation test design requires nuance

Mantel distribution

Mantel tests across 351 organism pairs show mean r = 0.23 with 62% positive — moderate but consistent conservation of cross-resistance architecture. The metal-label permutation test (p = 0.42) is non-significant because ALL metal pairs are positive; shuffling labels doesn't change the mean when correlations are uniformly positive. This is not a failure — it confirms that the signal is in the universal positivity of cross-resistance, not in specific metal pair identities.

Permutation test

Consensus vs individual

The leave-one-out consensus prediction (mean r = 0.41) shows that a universal cross-resistance map is a useful predictor of individual organism patterns, though only 2/28 organisms reach individual significance — indicating that the consensus captures the average trend but individual organisms have meaningful deviations.

(Notebook: 03_cross_resistance_conservation.ipynb)

4. Three-tier gene architecture: general stress > metal-shared > metal-specific in core enrichment (H2 supported)

Core enrichment gradient

Metal-important genes (8,162 across 28 organisms) decompose into three tiers:

Tier N genes % of total Mean % core % Fully core (≥95%)
General stress 1,484 18.2% 92.0% 57.2%
Metal-shared 2,306 28.3% 91.0% 50.4%
Metal-specific 4,372 53.6% 89.8% 45.7%

The predicted gradient holds: general stress genes (pleiotropic, important across many conditions) are the most conserved in the pangenome, metal-shared genes (cross-resistance drivers, important for ≥2 metals) are intermediate, and metal-specific genes (important for exactly 1 metal) are least conserved. The "fully core" gradient (57.2% → 50.4% → 45.7%) is particularly clear — an 11.5 percentage point spread.

This supports the evolutionary model: ancestral general stress defense (deepest core) → shared metal defense (core, evolving slower) → specialized metal-specific resistance (more accessory, evolving faster).

Functional enrichment by tier

Functional keyword analysis shows that general stress genes are enriched for energy/respiration and cell envelope functions, while metal-specific genes are enriched for transporters/efflux and iron/metal-related functions — consistent with the expectation that specialized metal resistance is mediated by dedicated transport systems.

(Notebook: 04_shared_vs_specific_genes.ipynb)

5. 318 conserved cross-resistance gene families identified

NB05 identified 318 ortholog groups (OGs) that are metal-shared (important for ≥2 metals) in ≥2 organisms. The most broadly conserved families span up to 14 organisms and encode cell envelope, energy metabolism, DNA repair, and ion homeostasis functions. These represent the core machinery of multi-metal tolerance — gene families that are consistently required for defense against multiple metals across phylogenetically diverse bacteria.

(Notebook: 05_pangenome_prediction.ipynb)

6. BacDive validation is inconclusive at FB organism scale (H3)

BacDive validation

The multi-metal tolerance scores do not correlate with BacDive isolation from metal environments at the FB species scale (Spearman rho ~ -0.02, p > 0.8). The validation set was corrected to (a) exclude 2 organisms without tier classification data (azobra, BFirm with <3 metals) and (b) collapse multiple FB strains of the same species to a single species-level entry (e.g., 5 P. fluorescens strains sharing the same BacDive pool). After matching and collapsing, the effective sample size is 20 independent species — far too small for a meaningful correlation test. BacDive matching uses genus+species substring search, which is imprecise for organisms identified only to genus level (e.g., "Acidovorax sp."); these fuzzy matches should be treated with caution.

This null result is expected: the prior bacdive_metal_validation project demonstrated that metal tolerance prediction requires pangenome-scale analysis (42K strains, Cohen's d = +1.0) to achieve statistical power. A proper H3 test would need the KEGG/PFAM mapping approach from the Metal Fitness Atlas applied to the cross-resistance gene signatures across 27K species.

(Notebook: 05_pangenome_prediction.ipynb)

Results

Data Scale

  • 452 metal experiments classified across 37 organisms and 14 metals (Al, Cd, Co, Cr, Cu, Fe, Hg, Mn, Mo, Ni, Se, U, W, Zn)
  • 119,561 genes with metal fitness data extracted for 28 organisms with ≥3 metals (30 extracted; 2 excluded from analysis for having <3 metals)
  • 317 organism-pair observations across 85 unique metal pairs
  • 8,162 metal-important genes classified into three tiers
  • 318 conserved cross-resistance gene families spanning ≥2 organisms

Consensus Cross-Resistance Matrix

The consensus matrix (mean Pearson r across organisms for each metal pair):

Pair Mean r N organisms Sign consistency
Fe-Zn 0.61 6 100%
Co-Ni 0.56 28 100%
Co-Zn 0.52 18 100%
Ni-Zn 0.51 18 100%
Cu-Zn 0.48 16 100%
Cu-Fe 0.45 7 100%
Al-Zn 0.44 13 100%
Co-Cu 0.43 24 96%
Cu-Ni 0.43 24 96%
Co-Fe 0.45 7 100%
Al-Fe 0.38 5 100%
Fe-Ni 0.38 7 100%
Al-Cu 0.34 17 94%
Al-Ni 0.34 20 95%
Al-Co 0.30 20 95%

Interpretation

The Two Layers of Cross-Resistance

This study reveals that metal cross-resistance operates on two layers:

  1. Universal layer (directional): All metals share some genetic basis for tolerance. This is because all metals disrupt fundamental cellular processes — protein stability, DNA integrity, membrane function, cofactor insertion — and the repair machinery for these processes is shared. This universal positivity is the dominant signal (98.1% of correlations positive).

  2. Chemistry-specific layer (magnitude): The degree of cross-resistance varies by metal pair and reflects chemical similarity. Divalent cations that compete for the same binding sites (Co, Ni, Zn) show the strongest cross-resistance. Metals with unique toxicity mechanisms (Al, Mo) show the weakest. This layer is moderately conserved across organisms (LOO r = 0.41).

Literature Context

  • Cross-resistance direction agrees with classical work: Nies (1999, 2003) described Co-Ni-Zn cross-resistance mediated by CzcCBA efflux systems. Our gene-level data confirms this pair (r = 0.56, n = 28 organisms) and extends it to show that the entire genome responds similarly to Co and Ni — not just efflux genes.
  • Scale is unprecedented: Prior cross-resistance studies used MIC assays on 1-5 model organisms (Nies 1999; Chandrangsu et al. 2017). We analyze gene-level fitness across 28 organisms and 13 metals — a 100× increase in resolution and 5× in taxonomic breadth.
  • Pal et al. (2015) co-occurrence approach complemented: Pal et al. used presence/absence of resistance genes across genomes to show metal-antibiotic co-occurrence. Our fitness data adds the quantitative magnitude dimension — genes can be partially important for a metal, not just present/absent.
  • Extends the Metal Fitness Atlas: The Metal Fitness Atlas (this observatory) showed metal genes are 87.4% core. We now show that within metal genes, there's a finer gradient: general stress (92%) > shared (91%) > specific (90%) — the core enrichment is layered, reflecting the evolutionary history of metal tolerance.
  • Counter ion effects validated: The counter_ion_effects project (this observatory) showed that DvH's metal-NaCl correlation hierarchy follows toxicity mechanism, not counter ion identity. Our cross-resistance matrices confirm and extend this: the same mechanistic grouping emerges from metal-metal correlations across 28 organisms.

Novel Contribution

  1. First gene-resolution cross-resistance analysis: Prior work used binary MIC (resistant/sensitive). We quantify the correlation of gene-level fitness effects between metals across the full genome.
  2. Universal positivity demonstrated: No metal pair shows systematic negative cross-resistance in any organism. This was suspected but never shown at scale.
  3. Three-tier architecture: The decomposition into general stress / metal-shared / metal-specific genes, with a corresponding pangenome core enrichment gradient, provides a new framework for understanding metal tolerance evolution.
  4. 318 conserved cross-resistance families: These gene families represent the shared machinery of multi-metal tolerance — candidates for engineering enhanced metal tolerance or for bioremediation applications.

Limitations

  • Metal concentration variation: Different experiments used different metal concentrations. Dose-response effects could influence cross-resistance estimates.
  • Unequal experiment counts: Organisms range from 3 to 112 metal experiments, affecting the reliability of per-organism matrices.
  • Phylogenetic non-independence: The 28 organisms are not phylogenetically independent. Formal phylogenetic comparative methods (PGLS, independent contrasts) would strengthen the conservation claim.
  • H3 underpowered: BacDive validation at FB organism scale (n = 26) lacks the statistical power achieved by the Metal Fitness Atlas at pangenome scale (42K strains).
  • No negative controls: All tested metal pairs are positive. We cannot distinguish "universal cross-resistance" from "all metals cause general stress" without non-metal stress controls (partially addressed by the counter_ion_effects project).

Future Directions

  1. Pangenome-scale H3 validation: Apply the cross-resistance gene signatures to predict multi-metal tolerance across 27K species using the KEGG/PFAM mapping approach from the Metal Fitness Atlas, then validate against BacDive polymetallic isolation environments.
  2. Phylogenetic independent contrasts: Formally control for phylogenetic non-independence in the cross-resistance conservation analysis using PGLS or phylogenetic PCA.
  3. Metal dose-response normalization: Normalize fitness effects by metal concentration (relative to MIC) to control for dose-response confounds.
  4. ICA module decomposition: Apply ICA (from the fitness_modules project) specifically to metal conditions to identify co-regulated metal-response modules and test whether cross-resistance genes cluster into coherent regulatory units.
  5. Structural biology of cross-resistance proteins: Use AlphaFold structures to investigate whether metal-shared proteins have structural features (metal binding sites, membrane interfaces) that explain multi-metal sensitivity.

Data

Sources

Collection Tables Used Purpose
kescience_fitnessbrowser experiment, genefitness, gene, organism Metal experiment classification, gene fitness extraction, functional annotations
kbase_ke_pangenome (via ortholog groups) Core/accessory genome classification
kescience_bacdive strain, isolation Isolation environment metadata for validation

Generated Data

File Rows Description
data/metal_experiments.csv 452 All FB metal experiments with metal classification
data/organism_metal_coverage.csv 37 × 14 Organism × metal experiment count matrix
data/gene_metal_fitness/*.csv 30 files Per-organism gene × metal mean fitness matrices
data/all_metal_pairs.csv 317 All pairwise metal correlations across all organisms
data/consensus_cross_resistance.csv 7 × 7 Consensus metal × metal correlation matrix
data/cross_resistance_matrices/*.csv 28 files Per-organism cross-resistance matrices
data/gene_tier_classification.csv 109,300 Per-gene tier classification (all genes including neutral)
data/tier_conservation.csv 3 Core fraction per tier
data/mantel_test_results.csv 351 Pairwise Mantel tests between organisms
data/permutation_test.csv 1 Permutation test summary statistics
data/cross_resistance_gene_families.csv 318 Conserved cross-resistance ortholog groups
data/organism_multimetal_scores.csv 30 Per-organism multi-metal tolerance scores

References

  • Nies DK (1999). Microbial heavy-metal resistance. Appl Microbiol Biotechnol 51:730-750.
  • Nies DK (2003). Efflux-mediated heavy metal resistance in prokaryotes. FEMS Microbiol Rev 27:313-339.
  • Pal C et al. (2015). Co-occurrence of resistance genes to antibiotics, biocides and metals reveals novel insights into their co-selection potential. BMC Genomics 16:964.
  • Chandrangsu P, Rensing C, Bhatt JM (2017). Metal homeostasis and resistance in bacteria. Nat Rev Microbiol 15:338-350.
  • Price MN et al. (2018). Mutant phenotypes for thousands of bacterial genes of unknown function. Nature 557:503-509.
  • Rensing C, Grass G (2003). Escherichia coli mechanisms of copper homeostasis in a changing environment. FEMS Microbiol Rev 27:197-213.
  • Arkin AP et al. (2018). KBase: The United States Department of Energy Systems Biology Knowledgebase. Nat Biotechnol 36:566-569.

Data Collections

Derived Data

This project builds on processed data from other projects.

Atlas Reuse

Derived products, review objects, and tensions connected to this project in the BERIL Atlas.

Review

AI Review BERIL Automated Review (Claude, claude-sonnet-4-20250514) 2026-04-28 Reviewed

Summary

This project provides a compelling gene-resolution analysis of metal cross-resistance across 28 phylogenetically diverse bacterial organisms. The work successfully demonstrates that metal cross-resistance is largely universal and directionally conserved (H1 strongly supported) and that genes decompose into a meaningful three-tier architecture with a clear core enrichment gradient (H2 supported). The analysis represents an unprecedented scale (28 organisms, 13 metals, 119,561 genes) and provides novel insights into the genetic architecture underlying cross-resistance patterns. While the BacDive validation (H3) is underpowered at the current scale, the project establishes a solid foundation for future pangenome-scale validation and delivers significant methodological and scientific advances.

Methodology

Research question and approach: The core research question—whether genetic architecture of metal cross-resistance is conserved across bacteria or rewired species-specifically—is well-formulated and directly testable. The three-hypothesis structure (H1: conservation, H2: core enrichment gradient, H3: BacDive validation) provides clear success criteria.

Data sources and scale: The project leverages the Fitness Browser's RB-TnSeq data across 28 organisms and up to 13 metals, representing a 100× increase in resolution over prior MIC-based studies. The data selection criteria (≥3 metals per organism) are appropriate and well-justified.

Analytical approach: The methodology is sound throughout: Pearson correlations for cross-resistance matrices, Mantel tests for conservation assessment, and permutation testing for significance. The three-tier gene classification (general stress > metal-shared > metal-specific) based on fitness profiles and sick rates is novel and biologically meaningful. The leave-one-out consensus prediction provides a rigorous test of universal applicability.

Statistical rigor: All statistical tests are appropriate for the data types and research questions. The permutation test design correctly addresses the challenge that all correlations are positive—the null hypothesis tests metal-label specificity, not correlation direction. Multiple testing considerations are handled appropriately with false discovery rate corrections where needed.

Known pitfalls addressed: The project correctly addresses critical BERDL pitfalls identified in docs/pitfalls.md: all Fitness Browser columns are properly cast to FLOAT, locusId types are consistently converted to strings, and organism-level filtering is applied before complex queries. The analysis appropriately excludes organisms with <3 metals from the tier analysis to maintain statistical validity.

Code Quality

SQL queries: SQL queries are well-written with appropriate CAST statements for string-typed Fitness Browser columns, proper filtering by orgId, and efficient join strategies. The metal experiment classification logic handles inconsistent naming conventions appropriately.

Statistical methods: Statistical approaches are appropriate throughout: Pearson correlations for gene-metal relationships, Mantel tests for matrix comparisons, permutation testing with proper null models, and Spearman correlations for non-parametric relationships. The variance decomposition approach quantifies organism-specific vs. chemistry-driven components effectively.

Notebook organization: Each notebook follows a logical flow from data loading through analysis to visualization and interpretation. Code cells are appropriately sized, and markdown documentation provides clear explanations of methods and findings.

Error handling: The analysis includes appropriate safeguards such as checking for file existence before loading, handling organisms with insufficient data gracefully, and validating matrix dimensions before correlation calculations.

Findings Assessment

Conclusions well-supported: The main conclusions are strongly supported by the data presented:
- H1 (conservation): 98.1% of correlations positive, >90% sign consistency across all metal pairs, significant Mantel correlations
- H2 (core gradient): Clear 92.0% → 91.0% → 89.8% gradient with 11.5 percentage point spread in fully core genes
- H3 (BacDive): Appropriately acknowledged as underpowered (n=20 species) and requiring pangenome-scale analysis

Novel contributions: The project makes several significant contributions: first gene-resolution cross-resistance analysis, demonstration of universal positivity in metal-metal correlations, identification of 318 conserved cross-resistance gene families, and quantification of chemistry-specific vs. organism-specific variance components.

Limitations acknowledged: The authors appropriately acknowledge key limitations: metal concentration variations across experiments, unequal experiment counts per organism, phylogenetic non-independence, and insufficient power for BacDive validation at current scale.

Literature context: The findings are properly contextualized against classical work (Nies 1999, 2003) while highlighting the unprecedented scale and resolution. The connection to the Metal Fitness Atlas and related observatory projects provides appropriate scientific context.

Suggestions

  1. Expand phylogenetic controls: Implement phylogenetic generalized least squares (PGLS) or phylogenetic independent contrasts to formally account for evolutionary relationships when testing cross-resistance conservation.

  2. Dose-response normalization: Consider normalizing fitness effects by metal concentration relative to MIC to control for dose-response confounds across experiments.

  3. Pangenome-scale H3 validation: Apply the cross-resistance gene signatures to predict multi-metal tolerance across the full 27K pangenome species using KEGG/PFAM mapping, then validate against BacDive at sufficient scale for statistical power.

  4. ICA module decomposition: Apply independent component analysis (from the fitness_modules project) specifically to metal conditions to identify co-regulated metal-response modules and test whether cross-resistance genes cluster into coherent regulatory units.

  5. Structural biology integration: Use AlphaFold structures to investigate whether metal-shared proteins have specific structural features (metal binding sites, membrane interfaces) that explain multi-metal sensitivity.

  6. Cross-reference counter-ion effects: Explicitly compare the cross-resistance hierarchy with the counter_ion_effects project findings to validate that the same mechanistic groupings emerge from metal-metal vs. metal-NaCl correlations.

This review was generated by an AI system. It should be treated as advisory input, not a definitive assessment.

Visualizations

Consensus Vs Individual

Consensus Vs Individual

Core Enrichment Gradient

Core Enrichment Gradient

Cross Resistance Panel

Cross Resistance Panel

Dvh Cross Resistance Heatmap

Dvh Cross Resistance Heatmap

Mantel Distribution

Mantel Distribution

Metal Clustering Dendrogram

Metal Clustering Dendrogram

Metal Pair Conservation

Metal Pair Conservation

Multimetal Validation

Multimetal Validation

Organism Agreement Heatmap

Organism Agreement Heatmap

Permutation Test

Permutation Test

Tier Functional Enrichment

Tier Functional Enrichment

Notebooks