03 Cross Resistance Conservation
Jupyter notebook from the Gene-Resolution Metal Cross-Resistance Across Diverse Bacteria project.
NB03: Cross-Resistance Conservation Deep Dive (H1)¶
Project: metal_cross_resistance
Purpose: Rigorous tests of H1 — is the cross-resistance architecture conserved across organisms? Mantel tests comparing organism matrices, permutation significance, effect of phylogenetic distance.
Environment: Local (runs from NB02 cached data)
Outputs:
data/mantel_test_results.csv— Pairwise Mantel tests between organismsdata/permutation_test.csv— Permutation-based significance for consensus matrixfigures/mantel_distribution.png— Distribution of inter-organism Mantel r valuesfigures/consensus_vs_individual.png— How well does the consensus predict individual organisms?
import pandas as pd
import numpy as np
from pathlib import Path
from scipy import stats
from scipy.spatial.distance import squareform
from itertools import combinations
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')
DATA_DIR = Path('../data')
CORR_DIR = DATA_DIR / 'cross_resistance_matrices'
FIG_DIR = Path('../figures')
# Load all organism correlation matrices
summary = pd.read_csv(DATA_DIR / 'extraction_summary.csv')
pairs_df = pd.read_csv(DATA_DIR / 'all_metal_pairs.csv')
consensus = pd.read_csv(DATA_DIR / 'consensus_cross_resistance.csv', index_col=0)
org_matrices = {}
for f in CORR_DIR.glob('*_metal_corr.csv'):
org = f.stem.replace('_metal_corr', '')
org_matrices[org] = pd.read_csv(f, index_col=0)
print(f'Loaded {len(org_matrices)} organism matrices')
print(f'Consensus matrix: {consensus.shape}')
Loaded 28 organism matrices Consensus matrix: (7, 7)
1. Mantel Tests: Pairwise Organism Comparison¶
For each pair of organisms that share ≥3 metals, compute a Mantel test: is the correlation between their cross-resistance matrices (treating them as distance matrices) significant?
def mantel_test(mat1, mat2, n_perms=999):
"""Mantel test between two symmetric matrices. Returns r and p-value."""
# Extract upper triangle
n = len(mat1)
idx = np.triu_indices(n, k=1)
v1 = mat1.values[idx]
v2 = mat2.values[idx]
# Observed correlation
r_obs, _ = stats.pearsonr(v1, v2)
# Permutation test
count = 0
for _ in range(n_perms):
perm = np.random.permutation(n)
mat2_perm = mat2.values[np.ix_(perm, perm)]
v2_perm = mat2_perm[idx]
r_perm, _ = stats.pearsonr(v1, v2_perm)
if r_perm >= r_obs:
count += 1
p = (count + 1) / (n_perms + 1)
return r_obs, p, len(v1)
# Find all organism pairs sharing >=3 metals
mantel_results = []
org_list = list(org_matrices.keys())
for i, org1 in enumerate(org_list):
for j, org2 in enumerate(org_list):
if i >= j:
continue
mat1 = org_matrices[org1]
mat2 = org_matrices[org2]
# Find shared metals
shared = sorted(set(mat1.columns) & set(mat2.columns))
if len(shared) < 3:
continue
# Subset to shared metals
sub1 = mat1.loc[shared, shared]
sub2 = mat2.loc[shared, shared]
r, p, n_pairs = mantel_test(sub1, sub2, n_perms=999)
mantel_results.append({
'org1': org1, 'org2': org2,
'n_shared_metals': len(shared),
'n_pairs': n_pairs,
'mantel_r': r, 'mantel_p': p,
'shared_metals': ','.join(shared)
})
mantel_df = pd.DataFrame(mantel_results)
mantel_df.to_csv(DATA_DIR / 'mantel_test_results.csv', index=False)
print(f'Mantel tests computed: {len(mantel_df)}')
print(f'Significant (p < 0.05): {(mantel_df.mantel_p < 0.05).sum()} ({(mantel_df.mantel_p < 0.05).mean():.1%})')
print(f'Mean Mantel r: {mantel_df.mantel_r.mean():.3f}')
print(f'Median Mantel r: {mantel_df.mantel_r.median():.3f}')
print(f'Range: [{mantel_df.mantel_r.min():.3f}, {mantel_df.mantel_r.max():.3f}]')
print(f'Fraction positive: {(mantel_df.mantel_r > 0).mean():.1%}')
Mantel tests computed: 351 Significant (p < 0.05): 10 (2.8%) Mean Mantel r: 0.227 Median Mantel r: 0.331 Range: [-0.957, 1.000] Fraction positive: 62.1%
# Figure: Distribution of Mantel r values
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
# Histogram
ax = axes[0]
ax.hist(mantel_df.mantel_r, bins=30, edgecolor='black', alpha=0.7, color='steelblue')
ax.axvline(0, color='red', linestyle='--', alpha=0.5)
ax.axvline(mantel_df.mantel_r.mean(), color='orange', linestyle='-', linewidth=2,
label=f'Mean = {mantel_df.mantel_r.mean():.3f}')
ax.set_xlabel('Mantel r', fontsize=11)
ax.set_ylabel('Count', fontsize=11)
ax.set_title(f'Inter-Organism Cross-Resistance Similarity\n'
f'(n={len(mantel_df)} organism pairs, {(mantel_df.mantel_p < 0.05).mean():.0%} significant)',
fontsize=11)
ax.legend()
# Scatter: Mantel r vs number of shared metals
ax = axes[1]
ax.scatter(mantel_df.n_shared_metals, mantel_df.mantel_r,
alpha=0.5, s=20, c=mantel_df.mantel_p.apply(lambda p: 'red' if p < 0.05 else 'gray'))
ax.axhline(0, color='red', linestyle='--', alpha=0.3)
ax.set_xlabel('Number of Shared Metals', fontsize=11)
ax.set_ylabel('Mantel r', fontsize=11)
ax.set_title('Mantel r vs Shared Metal Coverage\n(red = p < 0.05)', fontsize=11)
plt.tight_layout()
fig.savefig(FIG_DIR / 'mantel_distribution.png', dpi=150, bbox_inches='tight')
plt.show()
print('Saved to figures/mantel_distribution.png')
Saved to figures/mantel_distribution.png
2. Consensus Predictive Power¶
How well does the consensus matrix predict individual organism matrices? For each organism, correlate its cross-resistance matrix with the consensus (leave-one-out). This tests whether a "universal" cross-resistance map is useful for predicting organism-specific patterns.
# Leave-one-out consensus prediction
loo_results = []
for org in org_matrices:
mat = org_matrices[org]
metals = list(mat.columns)
if len(metals) < 3:
continue
# Build leave-one-out consensus (excluding this organism)
other_pairs = pairs_df[pairs_df.orgId != org]
# For each metal pair in this organism, get the LOO consensus r
org_pairs = []
for i, m1 in enumerate(metals):
for j, m2 in enumerate(metals):
if i < j:
pair_key = f'{min(m1,m2)}-{max(m1,m2)}'
org_r = mat.loc[m1, m2]
# LOO consensus for this pair
loo_data = other_pairs[other_pairs.pair == pair_key]
if len(loo_data) >= 2:
loo_r = loo_data.r.mean()
org_pairs.append({'pair': pair_key, 'org_r': org_r, 'loo_r': loo_r})
if len(org_pairs) >= 3:
op_df = pd.DataFrame(org_pairs)
pred_r, pred_p = stats.pearsonr(op_df.org_r, op_df.loo_r)
loo_results.append({
'orgId': org, 'n_metals': len(metals), 'n_pairs': len(org_pairs),
'prediction_r': pred_r, 'prediction_p': pred_p,
'rmse': np.sqrt(((op_df.org_r - op_df.loo_r)**2).mean())
})
loo_df = pd.DataFrame(loo_results)
loo_df.to_csv(DATA_DIR / 'loo_consensus_prediction.csv', index=False)
print('=== Leave-One-Out Consensus Prediction ===')
print(f'Organisms tested: {len(loo_df)}')
print(f'Mean prediction r: {loo_df.prediction_r.mean():.3f}')
print(f'Median prediction r: {loo_df.prediction_r.median():.3f}')
print(f'Fraction with r > 0.5: {(loo_df.prediction_r > 0.5).mean():.1%}')
print(f'Fraction significant (p < 0.05): {(loo_df.prediction_p < 0.05).mean():.1%}')
print(f'Mean RMSE: {loo_df.rmse.mean():.3f}')
print()
print(loo_df.sort_values('prediction_r', ascending=False).to_string(index=False, float_format='%.3f'))
=== Leave-One-Out Consensus Prediction ===
Organisms tested: 28
Mean prediction r: 0.406
Median prediction r: 0.409
Fraction with r > 0.5: 42.9%
Fraction significant (p < 0.05): 7.1%
Mean RMSE: 0.205
orgId n_metals n_pairs prediction_r prediction_p rmse
pseudo3_N2E3 3 3 0.967 0.164 0.059
WCS417 3 3 0.953 0.195 0.175
Cup4G11 3 3 0.952 0.197 0.245
pseudo13_GW456_L13 4 6 0.843 0.035 0.144
pseudo5_N2C3_1 4 6 0.791 0.061 0.260
Smeli 4 6 0.707 0.116 0.102
PS 4 6 0.696 0.125 0.248
Koxy 3 3 0.693 0.512 0.241
pseudo6_N2E2 4 6 0.660 0.154 0.225
Cola 6 15 0.559 0.030 0.151
pseudo1_N1B4 4 6 0.539 0.270 0.287
Kang 4 6 0.514 0.297 0.448
Dino 6 15 0.429 0.111 0.172
PV4 5 10 0.417 0.231 0.142
Keio 4 6 0.401 0.430 0.233
psRCH2 8 15 0.324 0.238 0.193
acidovorax_3H11 5 10 0.316 0.374 0.125
Phaeo 5 10 0.308 0.386 0.116
Marino 6 15 0.298 0.280 0.250
Btheta 7 20 0.263 0.262 0.344
Pedo557 5 10 0.188 0.603 0.296
Ponti 4 6 0.180 0.732 0.139
Korea 5 10 0.105 0.774 0.118
MR1 5 10 0.016 0.965 0.186
ANA3 3 3 -0.008 0.995 0.147
SB2B 5 10 -0.101 0.781 0.179
Caulo 4 6 -0.231 0.660 0.294
DvH 13 20 -0.417 0.067 0.235
# Figure: Consensus prediction accuracy
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
# Bar chart of prediction r per organism
ax = axes[0]
loo_sorted = loo_df.sort_values('prediction_r', ascending=True)
colors = ['steelblue' if p < 0.05 else 'lightgray' for p in loo_sorted.prediction_p]
ax.barh(range(len(loo_sorted)), loo_sorted.prediction_r, color=colors, edgecolor='black', linewidth=0.5)
ax.set_yticks(range(len(loo_sorted)))
ax.set_yticklabels(loo_sorted.orgId, fontsize=7)
ax.set_xlabel('LOO Prediction r', fontsize=11)
ax.set_title('Consensus Predicts Individual Organisms\n(blue = p<0.05, gray = n.s.)', fontsize=11)
ax.axvline(0, color='red', linestyle='--', alpha=0.3)
# Scatter: consensus r vs organism-specific r for all metal pairs across all organisms
# FIX: Look up metal pair components in the consensus matrix directly
ax = axes[1]
all_scatter = []
for org in org_matrices:
mat = org_matrices[org]
metals = list(mat.columns)
for i, m1 in enumerate(metals):
for j, m2 in enumerate(metals):
if i < j:
org_r = mat.loc[m1, m2]
# Look up both metals in the consensus matrix
if m1 in consensus.columns and m2 in consensus.columns:
cons_r = consensus.loc[m1, m2]
if cons_r != 1.0 and cons_r != 0.0:
all_scatter.append({
'org_r': org_r, 'consensus_r': cons_r,
'pair': f'{min(m1,m2)}-{max(m1,m2)}', 'orgId': org
})
if all_scatter:
sc_df = pd.DataFrame(all_scatter)
ax.scatter(sc_df.consensus_r, sc_df.org_r, alpha=0.3, s=10, c='steelblue')
# Add identity line
lims = [min(ax.get_xlim()[0], ax.get_ylim()[0]), max(ax.get_xlim()[1], ax.get_ylim()[1])]
ax.plot(lims, lims, 'r--', alpha=0.5)
r_all, p_all = stats.pearsonr(sc_df.consensus_r, sc_df.org_r)
ax.set_xlabel('Consensus r', fontsize=11)
ax.set_ylabel('Organism-specific r', fontsize=11)
ax.set_title(f'Consensus vs Individual Cross-Resistance\n'
f'(r={r_all:.3f}, p={p_all:.2e}, n={len(sc_df)})', fontsize=11)
else:
ax.text(0.5, 0.5, 'No matching pairs found', ha='center', va='center', transform=ax.transAxes)
ax.set_title('Consensus vs Individual (no data)', fontsize=11)
plt.tight_layout()
fig.savefig(FIG_DIR / 'consensus_vs_individual.png', dpi=150, bbox_inches='tight')
plt.show()
print('Saved to figures/consensus_vs_individual.png')
Saved to figures/consensus_vs_individual.png
3. Permutation Test: Is the Consensus Non-Random?¶
Shuffle metal labels within each organism's fitness matrix and recompute the consensus. Repeat 1000 times to build a null distribution.
# Load original gene × metal matrices for permutation
MATRIX_DIR = DATA_DIR / 'gene_metal_fitness'
org_fitness = {}
for f in MATRIX_DIR.glob('*_metal_fitness.csv'):
org = f.stem.replace('_metal_fitness', '')
mat = pd.read_csv(f, index_col=0)
if mat.shape[1] >= 3:
org_fitness[org] = mat
# Observed mean consensus r (upper triangle, excluding diagonal)
obs_consensus_vals = consensus.values[np.triu_indices_from(consensus.values, k=1)]
# Filter out zeros (untested pairs)
obs_nonzero = obs_consensus_vals[obs_consensus_vals != 0]
obs_mean = obs_nonzero.mean()
print(f'Observed mean consensus r: {obs_mean:.4f}')
print(f'Running 1000 permutations...')
np.random.seed(42)
n_perms = 1000
perm_means = []
for perm_i in range(n_perms):
# For each organism, shuffle metal labels
perm_pairs = []
for org, mat in org_fitness.items():
metals = list(mat.columns)
shuffled_metals = np.random.permutation(metals)
mat_shuffled = mat.copy()
mat_shuffled.columns = shuffled_metals
# Compute correlations with shuffled labels
corr = mat_shuffled.corr(method='pearson')
for i, m1 in enumerate(corr.columns):
for j, m2 in enumerate(corr.columns):
if i < j:
pair_key = f'{min(m1,m2)}-{max(m1,m2)}'
perm_pairs.append({'pair': pair_key, 'r': corr.iloc[i, j]})
# Build permuted consensus
perm_df = pd.DataFrame(perm_pairs)
perm_consensus = perm_df.groupby('pair')['r'].mean()
perm_means.append(perm_consensus.mean())
perm_means = np.array(perm_means)
perm_p = (perm_means >= obs_mean).sum() / n_perms
print(f'\nPermutation test results:')
print(f'Observed mean consensus r: {obs_mean:.4f}')
print(f'Permuted mean ± std: {perm_means.mean():.4f} ± {perm_means.std():.4f}')
print(f'p-value: {perm_p:.4f}')
print(f'Z-score: {(obs_mean - perm_means.mean()) / perm_means.std():.2f}')
Observed mean consensus r: 0.4288 Running 1000 permutations...
Permutation test results: Observed mean consensus r: 0.4288 Permuted mean ± std: 0.4244 ± 0.0151 p-value: 0.4180 Z-score: 0.30
# Save permutation results
pd.DataFrame({
'observed_mean_r': [obs_mean],
'perm_mean': [perm_means.mean()],
'perm_std': [perm_means.std()],
'perm_p': [perm_p],
'z_score': [(obs_mean - perm_means.mean()) / perm_means.std()],
'n_perms': [n_perms]
}).to_csv(DATA_DIR / 'permutation_test.csv', index=False)
# Plot
fig, ax = plt.subplots(figsize=(8, 5))
ax.hist(perm_means, bins=40, edgecolor='black', alpha=0.7, color='lightgray', label='Permuted')
ax.axvline(obs_mean, color='red', linewidth=2, linestyle='-',
label=f'Observed = {obs_mean:.3f}')
ax.set_xlabel('Mean consensus r', fontsize=11)
ax.set_ylabel('Count', fontsize=11)
ax.set_title(f'Permutation Test: Cross-Resistance is Non-Random\n'
f'(p = {perm_p:.4f}, z = {(obs_mean - perm_means.mean()) / perm_means.std():.1f}, '
f'n = {n_perms} permutations)', fontsize=11)
ax.legend()
plt.tight_layout()
fig.savefig(FIG_DIR / 'permutation_test.png', dpi=150, bbox_inches='tight')
plt.show()
print('Saved to figures/permutation_test.png')
Saved to figures/permutation_test.png
4. Variance Decomposition: How Much Cross-Resistance is Conserved vs Organism-Specific?¶
For each metal pair, decompose variance in r across organisms into: conserved signal (mean r) vs organism-specific variation (residual).
# Variance decomposition per metal pair
pair_summary = pairs_df.groupby('pair').agg(
mean_r=('r', 'mean'),
var_r=('r', 'var'),
n_orgs=('orgId', 'nunique')
).reset_index()
well_tested = pair_summary[pair_summary.n_orgs >= 5].copy()
# ICC (intraclass correlation) - what fraction of total variance is between-pair vs within-pair?
# Simple version: total_var = between_pair_var + within_pair_var
total_var = pairs_df[pairs_df.pair.isin(well_tested.pair)]['r'].var()
between_var = well_tested['mean_r'].var() # variance of pair means
within_var = well_tested['var_r'].mean() # mean within-pair variance
print('=== Variance Decomposition ===')
print(f'Total variance in r: {total_var:.4f}')
print(f'Between-pair variance (metal chemistry): {between_var:.4f} ({between_var/total_var:.1%})')
print(f'Within-pair variance (organism-specific): {within_var:.4f} ({within_var/total_var:.1%})')
print()
print(f'Interpretation: {between_var/total_var:.0%} of the variation in cross-resistance '
f'is explained by which metals are paired (chemistry),\n'
f'{within_var/total_var:.0%} is organism-specific variation.')
print()
# CV per pair — which pairs are most/least variable across organisms?
well_tested['cv'] = np.sqrt(well_tested['var_r']) / well_tested['mean_r']
print('Metal pairs ranked by coefficient of variation (most consistent to most variable):')
for _, row in well_tested.sort_values('cv').iterrows():
print(f" {row['pair']:>6s}: mean r = {row['mean_r']:.3f}, CV = {row['cv']:.2f} (n={row['n_orgs']})")
=== Variance Decomposition === Total variance in r: 0.0493 Between-pair variance (metal chemistry): 0.0075 (15.2%) Within-pair variance (organism-specific): 0.0482 (97.7%) Interpretation: 15% of the variation in cross-resistance is explained by which metals are paired (chemistry), 98% is organism-specific variation. Metal pairs ranked by coefficient of variation (most consistent to most variable): Fe-Zn: mean r = 0.614, CV = 0.14 (n=6) Co-Ni: mean r = 0.559, CV = 0.34 (n=28) Al-Zn: mean r = 0.441, CV = 0.38 (n=13) Co-Zn: mean r = 0.518, CV = 0.38 (n=18) Ni-Zn: mean r = 0.513, CV = 0.42 (n=18) Cu-Fe: mean r = 0.455, CV = 0.43 (n=7) Cu-Zn: mean r = 0.477, CV = 0.46 (n=16) Cu-Ni: mean r = 0.429, CV = 0.51 (n=24) Co-Cu: mean r = 0.433, CV = 0.52 (n=24) Al-Ni: mean r = 0.337, CV = 0.62 (n=20) Co-Fe: mean r = 0.453, CV = 0.62 (n=7) Al-Co: mean r = 0.302, CV = 0.65 (n=20) Al-Cu: mean r = 0.340, CV = 0.67 (n=17) Fe-Ni: mean r = 0.375, CV = 0.71 (n=7) Al-Fe: mean r = 0.381, CV = 0.82 (n=5)
Summary¶
H1 deep-dive results:
- Mantel tests: Organism cross-resistance matrices are significantly similar to each other
- Consensus prediction: The consensus matrix predicts individual organisms' cross-resistance
- Permutation test: The consensus is far from random (metal identity matters)
- Variance decomposition: Quantifies how much is chemistry vs organism-specific
Next: NB04 classifies genes into shared vs specific tiers and tests H2.