05 Pg Remodeling
Jupyter notebook from the Caulobacter Fur–Lipid A Loss project.
05 — Peptidoglycan remodeling panel (H4)¶
Project: caulobacter_fur_lipida_loss — Phase C, NB05. Tests H4.
Purpose¶
NB00 surfaced SdpA lytic murein transglycosylase at +4.8 log2 in OM proteome (4672 vs 4659) — a peptidoglycan-remodeling enzyme strongly up in the Δlpxc state. The comparative context: in A. baumannii, lipid A loss is tolerated via a PBP1A / LdtJ-LdtK PG-remodeling route (Kang et al. 2021, PMID 33402533), mechanistically distinct from the Caulobacter Δfur + anionic-sphingolipid route. The motivating question: does the Caulobacter response also engage PG remodeling, even though it's not the primary mechanism?
Pre-registered approach (RESEARCH_PLAN v2)¶
To prevent cherry-picking, the PG-remodeling gene set is defined and saved before DE analysis runs. The set is the union of dim_feature rows matching one of:
- Gene symbol in
{murA-G, murI, murJ, pbpA-G, mrcA/B, mrdA/B, dacA-D, ldtA-E, sdpA-B, mltA-G, mepA/K/M/S, amiA-C, spr} - Description regex
'transglycosylase|transpeptidase|penicillin.binding|peptidoglycan|ld.transpeptidase|D,D-carboxypeptidase|cell.wall|murein|lytic.transglycosylase'
Pre-registered H4 thresholds¶
| Outcome | Criterion |
|---|---|
| Supported | ≥3 PG-remodeling enzymes with FDR<0.05 in 4599-vs-4584 (transcript) OR |log2|>1 in 4672 vs 4659 (protein) |
| Borderline | 2 enzymes meeting either bar |
| Rejected | <2 enzymes |
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import re
from pathlib import Path
pd.set_option('display.max_colwidth', 90)
pd.set_option('display.width', 200)
sns.set_context('notebook')
sns.set_style('whitegrid')
PROJ = Path('/home/aparkin/BERIL-research-observatory/projects/caulobacter_fur_lipida_loss')
DATA_IN = Path('/home/aparkin/data/kr-caulobacter-envelope/clean')
DATA_OUT = PROJ / 'data'
FIG = PROJ / 'figures'
diff = pd.read_csv(DATA_IN / 'fact_differential.csv')
feat = pd.read_csv(DATA_IN / 'dim_feature.csv')
prot = pd.read_csv(DATA_IN / 'om_proteome.csv')
cpm = pd.read_csv(DATA_IN / 'fact_expression_cpm.csv')
print(f'diff: {diff.shape}, feat: {feat.shape}, prot: {prot.shape}')
diff: (11871, 8), feat: (4020, 4), prot: (795, 27)
1. Build and LOCK the PG-remodeling gene set (before DE analysis)¶
# Pre-registered gene symbols
PG_SYMBOLS = {
# MurA-J (biosynthesis precursors)
'murA','murB','murC','murD','murE','murF','murG','murI','murJ',
# Penicillin-binding proteins
'pbpA','pbpB','pbpC','pbpD','pbpE','pbpG',
'mrcA','mrcB','mrdA','mrdB',
# DD-carboxypeptidases
'dacA','dacB','dacC','dacD',
# LD-transpeptidases (cross-linking)
'ldtA','ldtB','ldtC','ldtD','ldtE',
# Soluble lytic transglycosylases (cleave PG)
'sdpA','sdpB','mltA','mltB','mltC','mltD','mltE','mltG',
# Endopeptidases
'mepA','mepK','mepM','mepS',
# N-acetylmuramoyl-L-alanine amidases
'amiA','amiB','amiC','spr',
}
# Pre-registered description regex
DESC_PATTERN = re.compile(r'transglycosylase|transpeptidase|penicillin.binding|peptidoglycan|ld.transpeptidase|D,D.carboxypeptidase|cell.wall|murein|lytic.transglycosylase|N.acetylmuramoyl|amidase', re.I)
# Build the set
by_symbol = feat[feat['gene'].fillna('').isin(PG_SYMBOLS)].copy()
by_symbol['source'] = 'gene_symbol'
by_desc = feat[feat['description'].fillna('').apply(lambda s: bool(DESC_PATTERN.search(s)))].copy()
by_desc['source'] = 'description_regex'
# Combine, deduplicate
pg_set = pd.concat([by_symbol, by_desc], ignore_index=True).drop_duplicates('locustag')
pg_set = pg_set.sort_values('locustag').reset_index(drop=True)
print(f'PG-remodeling gene set: {len(pg_set)} loci')
print(f' via gene symbol: {len(by_symbol)}')
print(f' via description regex: {len(by_desc)}')
print(f' unique union: {len(pg_set)}')
# SAVE BEFORE DE ANALYSIS (anti-cherry-pick guard)
pg_set[['locustag','gene','description','feature_type','source']].to_csv(
DATA_OUT / 'NB05_pg_gene_set.csv', index=False)
print(f'\n*** Gene set saved to {DATA_OUT/"NB05_pg_gene_set.csv"} BEFORE DE analysis ***')
print('Any deviations require a plan v3 revision.\n')
print('First 20 entries:')
display(pg_set.head(20))
PG-remodeling gene set: 53 loci via gene symbol: 7 via description regex: 52 unique union: 53 *** Gene set saved to /home/aparkin/BERIL-research-observatory/projects/caulobacter_fur_lipida_loss/data/NB05_pg_gene_set.csv BEFORE DE analysis *** Any deviations require a plan v3 revision. First 20 entries:
| locustag | gene | description | feature_type | source | |
|---|---|---|---|---|---|
| 0 | CCNA_00063 | mviN | Peptidoglycan biosynthesis protein MviN/MurJ | CDS | description_regex |
| 1 | CCNA_00103 | NaN | gamma-glutamyltranspeptidase | CDS | description_regex |
| 2 | CCNA_00212 | aguB | N-carbamoylputrescine amidase | CDS | description_regex |
| 3 | CCNA_00234 | NaN | WecE-family cell wall biogenesis enzyme | CDS | description_regex |
| 4 | CCNA_00252 | pbpX | multimodular transpeptidase-transglycosylase PbpX | CDS | description_regex |
| 5 | CCNA_00328 | mtgA | monofunctional biosynthetic peptidoglycan transglycosylase MtgA | CDS | description_regex |
| 6 | CCNA_00354 | NaN | NlpC/P60 CHAP domain amidase protein | CDS | description_regex |
| 7 | CCNA_00565 | NaN | gamma-glutamyltranspeptidase | CDS | description_regex |
| 8 | CCNA_00784 | NaN | peptidoglycan-associated outer membrane lipoprotein | CDS | description_regex |
| 9 | CCNA_01035 | NaN | gamma-glutamyltranspeptidase | CDS | description_regex |
| 10 | CCNA_01228 | NaN | pyrazinamidase/nicotinamidase | CDS | description_regex |
| 11 | CCNA_01252 | sdpA | soluble lytic murein transglycosylase SdpA | CDS | gene_symbol |
| 12 | CCNA_01384 | NaN | amidase family protein | CDS | description_regex |
| 13 | CCNA_01393 | sdpB | soluble lytic murein transglycosylase SdpB | CDS | gene_symbol |
| 14 | CCNA_01437 | NaN | transglycosylase associated protein | CDS | description_regex |
| 15 | CCNA_01579 | ldtD | L,D-transpeptidase LdtD | CDS | gene_symbol |
| 16 | CCNA_01584 | pbp1A | multimodular transpeptidase-transglycosylase PBP 1A | CDS | description_regex |
| 17 | CCNA_01615 | NaN | cell elongation specific D,D-transpeptidase | CDS | description_regex |
| 18 | CCNA_01616 | rodA | SEDS-family peptidoglycan polymerase RodA | CDS | description_regex |
| 19 | CCNA_01754 | NaN | transglycosylase associated protein | CDS | description_regex |
2. Pull DE for all 3 contrasts + OM proteome¶
# DE for all 3 contrasts
pg_de = (diff[diff['locustag'].isin(pg_set['locustag'])]
.pivot_table(index='locustag', columns='contrast', values=['logFC','fdr'], aggfunc='first'))
pg_de.columns = ['_'.join(c) for c in pg_de.columns]
pg_de = pg_de.reset_index()
pg_de = pg_de.merge(pg_set[['locustag','gene','description']], on='locustag', how='left')
pg_de = pg_de[['locustag','gene','description','logFC_4584_vs_4580','fdr_4584_vs_4580',
'logFC_4599_vs_4584','fdr_4599_vs_4584','logFC_4599_vs_4580','fdr_4599_vs_4580']]
print(f'PG loci with DE data: {len(pg_de)}')
# OM proteome
pg_prot = prot[prot['ccna_locus'].isin(pg_set['locustag'])].copy()
if len(pg_prot) > 0:
pg_prot['log2_4659_over_4580'] = np.log2(pg_prot['abundance_4659'] / pg_prot['abundance_4580'])
pg_prot['log2_4672_over_4659'] = np.log2(pg_prot['abundance_4672'] / pg_prot['abundance_4659'])
print(f'PG loci with OM proteome detection: {len(pg_prot)}')
PG loci with DE data: 53 PG loci with OM proteome detection: 18
3. Apply pre-registered threshold¶
# (a) Transcript: |logFC|>=any AND FDR<0.05 in 4599-vs-4584
trans_sig = pg_de[pg_de['fdr_4599_vs_4584'] < 0.05].sort_values('logFC_4599_vs_4584', ascending=False)
trans_up = trans_sig[trans_sig['logFC_4599_vs_4584'] > 0]
trans_dn = trans_sig[trans_sig['logFC_4599_vs_4584'] < 0]
print(f'Transcript-significant in 4599-vs-4584 (FDR<0.05): {len(trans_sig)}')
print(f' UP: {len(trans_up)}, DOWN: {len(trans_dn)}')
print()
print('=== TRANSCRIPT — UP in 4599-vs-4584 (significant) ===')
display(trans_up[['locustag','gene','description','logFC_4584_vs_4580','logFC_4599_vs_4584','fdr_4599_vs_4584']].round(3))
print('\n=== TRANSCRIPT — DOWN in 4599-vs-4584 (significant) ===')
display(trans_dn[['locustag','gene','description','logFC_4584_vs_4580','logFC_4599_vs_4584','fdr_4599_vs_4584']].round(3))
# (b) Protein: |log2|>1 in 4672 vs 4659
prot_sig = pg_prot.dropna(subset=['log2_4672_over_4659'])
prot_sig = prot_sig[prot_sig['log2_4672_over_4659'].abs() > 1].sort_values('log2_4672_over_4659', ascending=False)
print(f'\n=== PROTEIN — |log2(4672/4659)|>1 in OM proteome ===')
display(prot_sig[['ccna_locus','gene_symbol','description','abundance_4580','abundance_4659','abundance_4672','log2_4659_over_4580','log2_4672_over_4659']].round(2))
Transcript-significant in 4599-vs-4584 (FDR<0.05): 25 UP: 5, DOWN: 20 === TRANSCRIPT — UP in 4599-vs-4584 (significant) ===
| locustag | gene | description | logFC_4584_vs_4580 | logFC_4599_vs_4584 | fdr_4599_vs_4584 | |
|---|---|---|---|---|---|---|
| 19 | CCNA_01754 | NaN | transglycosylase associated protein | 0.135 | 2.212 | 0.000 |
| 8 | CCNA_00784 | NaN | peptidoglycan-associated outer membrane lipoprotein | -0.575 | 2.080 | 0.001 |
| 4 | CCNA_00252 | pbpX | multimodular transpeptidase-transglycosylase PbpX | 0.325 | 0.977 | 0.001 |
| 29 | CCNA_02411 | pleA | soluble lytic transglycosylase PleA | 0.138 | 0.886 | 0.000 |
| 44 | CCNA_03337 | pal | peptidoglycan-associated lipoprotein pal | -0.096 | 0.437 | 0.011 |
=== TRANSCRIPT — DOWN in 4599-vs-4584 (significant) ===
| locustag | gene | description | logFC_4584_vs_4580 | logFC_4599_vs_4584 | fdr_4599_vs_4584 | |
|---|---|---|---|---|---|---|
| 6 | CCNA_00354 | NaN | NlpC/P60 CHAP domain amidase protein | 0.216 | -0.302 | 0.034 |
| 47 | CCNA_03431 | NaN | membrane-bound lytic murein transglycosylase B | -0.076 | -0.345 | 0.024 |
| 33 | CCNA_02635 | ftsW | SEDS-family peptidoglycan polymerase FtsW | -0.305 | -0.367 | 0.022 |
| 37 | CCNA_02642 | NaN | UDP-N-acetylmuramoylalanyl-D-glutamate--2, 6-diaminopimelate ligase | 0.215 | -0.439 | 0.014 |
| 27 | CCNA_02190 | NaN | penicillin-binding protein | -0.165 | -0.461 | 0.025 |
| 5 | CCNA_00328 | mtgA | monofunctional biosynthetic peptidoglycan transglycosylase MtgA | 0.101 | -0.461 | 0.038 |
| 36 | CCNA_02641 | NaN | UDP-N-acetylmuramoyl-tripeptide--D-alanyl-D- alanine ligase | 0.121 | -0.514 | 0.004 |
| 46 | CCNA_03410 | ldpE | M23 family peptidoglycan-specific endopeptidase LdpE | -0.251 | -0.530 | 0.004 |
| 7 | CCNA_00565 | NaN | gamma-glutamyltranspeptidase | -0.037 | -0.534 | 0.022 |
| 43 | CCNA_03129 | ldpD | M23 family peptidoglycan-specific endopeptidase LdpD | 0.179 | -0.545 | 0.007 |
| 38 | CCNA_02643 | ftsI | penicillin-binding protein FtsI | 0.329 | -0.555 | 0.002 |
| 51 | CCNA_03856 | NaN | membrane-bound lytic murein transglycosylase A | -0.053 | -0.560 | 0.002 |
| 0 | CCNA_00063 | mviN | Peptidoglycan biosynthesis protein MviN/MurJ | -0.127 | -0.611 | 0.007 |
| 21 | CCNA_01833 | NaN | glucosylceramidase | 0.017 | -0.698 | 0.003 |
| 49 | CCNA_03547 | ldpF | M23 family peptidoglycan-specific endopeptidase LdpF | -0.114 | -0.721 | 0.003 |
| 22 | CCNA_01901 | NaN | RipA-family peptidoglycan hydrolase | 0.603 | -0.753 | 0.006 |
| 25 | CCNA_01952 | amiC | N-acetylmuramoyl-L-alanine amidase AmiC | 0.383 | -0.754 | 0.006 |
| 48 | CCNA_03435 | NaN | peptidoglycan binding domain/FecR protein | 0.135 | -0.819 | 0.000 |
| 34 | CCNA_02639 | murD | UDP-N-acetylmuramoylalanine--D-glutamate ligase | -0.189 | -0.859 | 0.004 |
| 50 | CCNA_03685 | pbpZ | multimodular transpeptidase-transglycosylase PbpZ | -0.340 | -1.079 | 0.006 |
=== PROTEIN — |log2(4672/4659)|>1 in OM proteome ===
| ccna_locus | gene_symbol | description | abundance_4580 | abundance_4659 | abundance_4672 | log2_4659_over_4580 | log2_4672_over_4659 | |
|---|---|---|---|---|---|---|---|---|
| 224 | CCNA_01252 | sdpA | submitted name: Soluble lytic murein transglycosylase SdpA [OS=Caulobacter vibrioides ... | 13.3 | 9.8 | 276.8 | -0.44 | 4.82 |
| 83 | CCNA_00784 | NaN | submitted name: Peptidoglycan-associated outer membrane lipoprotein [OS=Caulobacter vi... | 56.4 | 29.9 | 213.7 | -0.92 | 2.84 |
| 382 | CCNA_00565 | NaN | Glutathione hydrolase proenzyme [OS=Caulobacter vibrioides NA1000] | NaN | 50.6 | 249.4 | NaN | 2.30 |
| 451 | CCNA_03386 | pbpC | peptidoglycan glycosyltransferase [OS=Caulobacter vibrioides NA1000] | 98.1 | 139.3 | 62.6 | 0.51 | -1.15 |
| 401 | CCNA_01615 | NaN | submitted name: Cell elongation specific D,D-transpeptidase [OS=Caulobacter vibrioides... | NaN | 212.2 | 87.8 | NaN | -1.27 |
| 176 | CCNA_03856 | NaN | peptidoglycan lytic exotransglycosylase [OS=Caulobacter vibrioides NA1000] | 166.9 | 112.8 | 20.3 | -0.57 | -2.47 |
# H4 verdict
n_trans_sig = len(trans_sig)
n_prot_sig = len(prot_sig)
# Union (genes meeting EITHER criterion are counted once)
meeting_set = set(trans_sig['locustag']) | set(prot_sig['ccna_locus'])
n_total_meeting = len(meeting_set)
print('=== H4 PRE-REGISTERED VERDICT ===\n')
print(f'PG-remodeling enzymes meeting threshold (FDR<0.05 transcript OR |log2|>1 protein):')
print(f' Transcript-significant: {n_trans_sig}')
print(f' Protein |log2|>1: {n_prot_sig}')
print(f' Union (unique genes): {n_total_meeting}')
print()
if n_total_meeting >= 3:
h4_verdict = 'SUPPORTED'
elif n_total_meeting == 2:
h4_verdict = 'BORDERLINE'
else:
h4_verdict = 'REJECTED'
print(f'Verdict (≥3 supported / 2 borderline / <2 rejected): {h4_verdict}')
=== H4 PRE-REGISTERED VERDICT === PG-remodeling enzymes meeting threshold (FDR<0.05 transcript OR |log2|>1 protein): Transcript-significant: 25 Protein |log2|>1: 6 Union (unique genes): 28 Verdict (≥3 supported / 2 borderline / <2 rejected): SUPPORTED
4. Direction-aware interpretation¶
Both UP and DOWN-regulation count as "remodeling" — but the biological interpretation differs:
- UP: cell is adding enzymatic activity (e.g., lytic transglycosylases for PG cleavage, LDtransases for cross-link rewiring)
- DOWN: cell is removing enzymatic activity (e.g., shutting down synthesis when growth slows)
A genuinely "remodeled" PG would show both up and down — different enzymes redirected to different ends of the PG turnover cycle.
# Up vs down breakdown of significant transcript changes
print('Direction analysis — transcript 4599-vs-4584 (significant only):')
print(f' UP: {len(trans_up)}')
print(f' DOWN: {len(trans_dn)}')
print()
# Functional family within PG set
def family_of(row):
g = row.get('gene','')
gene = (g if isinstance(g,str) else '').lower()
d = row.get('description','')
desc = (d if isinstance(d,str) else '').lower()
if 'mur' in gene and not 'murein' in desc:
return 'biosynthesis (mur)'
if 'pbp' in gene or 'penicillin' in desc:
return 'PBP (transglycosylase+transpeptidase)'
if gene in {'dacA','dacB','dacC','dacD'} or 'd,d-carboxypeptidase' in desc:
return 'DD-carboxypeptidase'
if gene.startswith('ldt') or 'ld-transpeptidase' in desc or 'ld transpeptidase' in desc:
return 'LD-transpeptidase'
if 'transglycosylase' in desc or gene.startswith('mlt') or gene.startswith('sdp'):
return 'lytic transglycosylase'
if 'endopeptidase' in desc or gene.startswith('mep'):
return 'endopeptidase'
if gene.startswith('ami') or 'amidase' in desc:
return 'amidase'
if 'transpeptidase' in desc:
return 'transpeptidase'
if 'peptidoglycan' in desc:
return 'PG-associated'
return 'other'
pg_de['family'] = pg_de.apply(family_of, axis=1)
trans_sig = trans_sig.merge(pg_de[['locustag','family']], on='locustag', how='left')
print('Significant transcript hits by family:')
print(trans_sig['family'].value_counts())
print()
# Combined view
print('=== ALL significant PG-remodeling changes — combined ===')
all_hits = trans_sig.copy()
all_hits['source'] = 'transcript'
all_hits = all_hits[['locustag','gene','description','family','logFC_4599_vs_4584','fdr_4599_vs_4584','source']]
all_hits.columns = ['locustag','gene','description','family','signal','fdr_or_NA','source']
prot_hits = prot_sig[['ccna_locus','gene_symbol','description','log2_4672_over_4659']].copy()
prot_hits['family'] = prot_hits['gene_symbol'].astype(str).str.lower().map(lambda g: family_of({'gene':g,'description':''}))
prot_hits.columns = ['locustag','gene','description','signal','family']
prot_hits['fdr_or_NA'] = np.nan
prot_hits['source'] = 'protein'
prot_hits = prot_hits[['locustag','gene','description','family','signal','fdr_or_NA','source']]
all_hits = pd.concat([all_hits, prot_hits], ignore_index=True).sort_values('signal', ascending=False)
display(all_hits.round(2))
all_hits.to_csv(DATA_OUT / 'NB05_pg_significant_hits.csv', index=False)
Direction analysis — transcript 4599-vs-4584 (significant only): UP: 5 DOWN: 20 Significant transcript hits by family: family PG-associated 6 lytic transglycosylase 5 PBP (transglycosylase+transpeptidase) 4 amidase 3 endopeptidase 3 other 2 transpeptidase 1 biosynthesis (mur) 1 Name: count, dtype: int64 === ALL significant PG-remodeling changes — combined ===
| locustag | gene | description | family | signal | fdr_or_NA | source | |
|---|---|---|---|---|---|---|---|
| 25 | CCNA_01252 | sdpA | submitted name: Soluble lytic murein transglycosylase SdpA [OS=Caulobacter vibrioides ... | lytic transglycosylase | 4.82 | NaN | protein |
| 26 | CCNA_00784 | NaN | submitted name: Peptidoglycan-associated outer membrane lipoprotein [OS=Caulobacter vi... | other | 2.84 | NaN | protein |
| 27 | CCNA_00565 | NaN | Glutathione hydrolase proenzyme [OS=Caulobacter vibrioides NA1000] | other | 2.30 | NaN | protein |
| 0 | CCNA_01754 | NaN | transglycosylase associated protein | lytic transglycosylase | 2.21 | 0.00 | transcript |
| 1 | CCNA_00784 | NaN | peptidoglycan-associated outer membrane lipoprotein | PG-associated | 2.08 | 0.00 | transcript |
| 2 | CCNA_00252 | pbpX | multimodular transpeptidase-transglycosylase PbpX | PBP (transglycosylase+transpeptidase) | 0.98 | 0.00 | transcript |
| 3 | CCNA_02411 | pleA | soluble lytic transglycosylase PleA | lytic transglycosylase | 0.89 | 0.00 | transcript |
| 4 | CCNA_03337 | pal | peptidoglycan-associated lipoprotein pal | PG-associated | 0.44 | 0.01 | transcript |
| 5 | CCNA_00354 | NaN | NlpC/P60 CHAP domain amidase protein | amidase | -0.30 | 0.03 | transcript |
| 6 | CCNA_03431 | NaN | membrane-bound lytic murein transglycosylase B | lytic transglycosylase | -0.35 | 0.02 | transcript |
| 7 | CCNA_02635 | ftsW | SEDS-family peptidoglycan polymerase FtsW | PG-associated | -0.37 | 0.02 | transcript |
| 8 | CCNA_02642 | NaN | UDP-N-acetylmuramoylalanyl-D-glutamate--2, 6-diaminopimelate ligase | other | -0.44 | 0.01 | transcript |
| 9 | CCNA_02190 | NaN | penicillin-binding protein | PBP (transglycosylase+transpeptidase) | -0.46 | 0.03 | transcript |
| 10 | CCNA_00328 | mtgA | monofunctional biosynthetic peptidoglycan transglycosylase MtgA | lytic transglycosylase | -0.46 | 0.04 | transcript |
| 11 | CCNA_02641 | NaN | UDP-N-acetylmuramoyl-tripeptide--D-alanyl-D- alanine ligase | other | -0.51 | 0.00 | transcript |
| 12 | CCNA_03410 | ldpE | M23 family peptidoglycan-specific endopeptidase LdpE | endopeptidase | -0.53 | 0.00 | transcript |
| 13 | CCNA_00565 | NaN | gamma-glutamyltranspeptidase | transpeptidase | -0.53 | 0.02 | transcript |
| 14 | CCNA_03129 | ldpD | M23 family peptidoglycan-specific endopeptidase LdpD | endopeptidase | -0.55 | 0.01 | transcript |
| 15 | CCNA_02643 | ftsI | penicillin-binding protein FtsI | PBP (transglycosylase+transpeptidase) | -0.55 | 0.00 | transcript |
| 16 | CCNA_03856 | NaN | membrane-bound lytic murein transglycosylase A | lytic transglycosylase | -0.56 | 0.00 | transcript |
| 17 | CCNA_00063 | mviN | Peptidoglycan biosynthesis protein MviN/MurJ | PG-associated | -0.61 | 0.01 | transcript |
| 18 | CCNA_01833 | NaN | glucosylceramidase | amidase | -0.70 | 0.00 | transcript |
| 19 | CCNA_03547 | ldpF | M23 family peptidoglycan-specific endopeptidase LdpF | endopeptidase | -0.72 | 0.00 | transcript |
| 20 | CCNA_01901 | NaN | RipA-family peptidoglycan hydrolase | PG-associated | -0.75 | 0.01 | transcript |
| 21 | CCNA_01952 | amiC | N-acetylmuramoyl-L-alanine amidase AmiC | amidase | -0.75 | 0.01 | transcript |
| 22 | CCNA_03435 | NaN | peptidoglycan binding domain/FecR protein | PG-associated | -0.82 | 0.00 | transcript |
| 23 | CCNA_02639 | murD | UDP-N-acetylmuramoylalanine--D-glutamate ligase | biosynthesis (mur) | -0.86 | 0.00 | transcript |
| 24 | CCNA_03685 | pbpZ | multimodular transpeptidase-transglycosylase PbpZ | PBP (transglycosylase+transpeptidase) | -1.08 | 0.01 | transcript |
| 28 | CCNA_03386 | pbpC | peptidoglycan glycosyltransferase [OS=Caulobacter vibrioides NA1000] | PBP (transglycosylase+transpeptidase) | -1.15 | NaN | protein |
| 29 | CCNA_01615 | NaN | submitted name: Cell elongation specific D,D-transpeptidase [OS=Caulobacter vibrioides... | other | -1.27 | NaN | protein |
| 30 | CCNA_03856 | NaN | peptidoglycan lytic exotransglycosylase [OS=Caulobacter vibrioides NA1000] | other | -2.47 | NaN | protein |
5. Heatmap visualization¶
# Focus on the genes meeting either criterion + a few baseline non-changers for visual context
focus_loci = list(meeting_set)
focus_data = (cpm[cpm['locustag'].isin(focus_loci)]
.assign(log2cpm=lambda d: np.log2(d['cpm']+1))
.pivot_table(index='locustag', columns='sample', values='log2cpm'))
sample_order = sorted(focus_data.columns, key=lambda s: (int(s.split('_')[0]), s))
focus_data = focus_data[sample_order]
# Annotate y-labels
ann_lookup = pg_set.set_index('locustag')[['gene','description']].to_dict('index')
yticks = []
for lt in focus_data.index:
a = ann_lookup.get(lt, {})
g_raw = a.get('gene')
g = (g_raw.strip() if isinstance(g_raw,str) else '')
d_raw = a.get('description')
d = (d_raw[:45] if isinstance(d_raw,str) else '')
yticks.append(f'{lt} {g}{": " if g else ""}{d}')
fig, ax = plt.subplots(figsize=(11, max(4, len(focus_data) * 0.3)))
sns.heatmap(focus_data, cmap='viridis', annot=focus_data.round(1), fmt='.1f',
cbar_kws={'label':'log2(CPM+1)'},
yticklabels=yticks, xticklabels=sample_order, ax=ax)
ax.set_title(f'PG-remodeling enzymes meeting H4 threshold (n={len(focus_data)}) — log2(CPM+1)')
ax.set_xlabel('Library'); ax.set_ylabel('Locus gene description')
plt.tight_layout()
plt.savefig(FIG / 'NB05_pg_remodeling_heatmap.png', dpi=140, bbox_inches='tight')
plt.show()
6. Cross-reference to A. baumannii PBP1A / LdtJ-LdtK route (comparative)¶
Kang et al. 2021 (PMID 33402533) showed A. baumannii survives LOS loss via PBP1A loss + LdtJ/LdtK induction. Do we see PBP and Ldt changes in the Caulobacter response?
# Specifically check PBPs and Ldts
pbp_loci = pg_de[pg_de['family'] == 'PBP (transglycosylase+transpeptidase)']
ldt_loci = pg_de[pg_de['family'] == 'LD-transpeptidase']
mep_loci = pg_de[pg_de['family'] == 'endopeptidase']
print('=== PBP family in our data ===')
display(pbp_loci[['locustag','gene','description','logFC_4584_vs_4580','logFC_4599_vs_4584','fdr_4599_vs_4584']].round(3))
print('\n=== Ldt (LD-transpeptidase) family in our data ===')
display(ldt_loci[['locustag','gene','description','logFC_4584_vs_4580','logFC_4599_vs_4584','fdr_4599_vs_4584']].round(3))
print('\n=== Endopeptidase (mepA/M/S etc.) family ===')
display(mep_loci[['locustag','gene','description','logFC_4584_vs_4580','logFC_4599_vs_4584','fdr_4599_vs_4584']].round(3))
=== PBP family in our data ===
| locustag | gene | description | logFC_4584_vs_4580 | logFC_4599_vs_4584 | fdr_4599_vs_4584 | |
|---|---|---|---|---|---|---|
| 4 | CCNA_00252 | pbpX | multimodular transpeptidase-transglycosylase PbpX | 0.325 | 0.977 | 0.001 |
| 16 | CCNA_01584 | pbp1A | multimodular transpeptidase-transglycosylase PBP 1A | -0.144 | -0.001 | 0.998 |
| 20 | CCNA_01778 | NaN | penicillin-binding protein | -0.165 | -0.136 | 0.360 |
| 24 | CCNA_01951 | pbpY | multimodular transpeptidase-transglycosylase PbpY | -0.232 | 0.106 | 0.434 |
| 27 | CCNA_02190 | NaN | penicillin-binding protein | -0.165 | -0.461 | 0.025 |
| 38 | CCNA_02643 | ftsI | penicillin-binding protein FtsI | 0.329 | -0.555 | 0.002 |
| 45 | CCNA_03386 | pbpC | multimodular transpeptidase-transglycosylase PbpC | 0.017 | -0.242 | 0.201 |
| 50 | CCNA_03685 | pbpZ | multimodular transpeptidase-transglycosylase PbpZ | -0.340 | -1.079 | 0.006 |
=== Ldt (LD-transpeptidase) family in our data ===
| locustag | gene | description | logFC_4584_vs_4580 | logFC_4599_vs_4584 | fdr_4599_vs_4584 | |
|---|---|---|---|---|---|---|
| 15 | CCNA_01579 | ldtD | L,D-transpeptidase LdtD | -0.579 | 0.203 | 0.152 |
| 52 | CCNA_03860 | ldtX | L,D-transpeptidase LdtX | 0.299 | 0.089 | 0.601 |
=== Endopeptidase (mepA/M/S etc.) family ===
| locustag | gene | description | logFC_4584_vs_4580 | logFC_4599_vs_4584 | fdr_4599_vs_4584 | |
|---|---|---|---|---|---|---|
| 23 | CCNA_01948 | ldpA | M23 family peptidoglycan-specific endopeptidase LdpA | 0.534 | 0.261 | 0.109 |
| 26 | CCNA_02075 | dipM | peptidoglycan binding endopeptidase DipM | -0.020 | -0.127 | 0.314 |
| 28 | CCNA_02331 | ldpB | M23 family peptidoglycan-specific endopeptidase LdpB | 0.009 | -0.153 | 0.426 |
| 43 | CCNA_03129 | ldpD | M23 family peptidoglycan-specific endopeptidase LdpD | 0.179 | -0.545 | 0.007 |
| 46 | CCNA_03410 | ldpE | M23 family peptidoglycan-specific endopeptidase LdpE | -0.251 | -0.530 | 0.004 |
| 49 | CCNA_03547 | ldpF | M23 family peptidoglycan-specific endopeptidase LdpF | -0.114 | -0.721 | 0.003 |
7. Summary¶
print('=== NB05 SUMMARY ===\n')
print(f'Pre-registered PG-remodeling gene set: {len(pg_set)} loci (locked before DE)')
print(f' via gene symbol: {len(by_symbol)}')
print(f' via description regex: {len(by_desc)}')
print()
print(f'Transcript-significant changes in 4599-vs-4584 (FDR<0.05): {len(trans_sig)}')
print(f' UP: {len(trans_up)}, DOWN: {len(trans_dn)}')
print()
print(f'OM proteome |log2(4672/4659)|>1 hits: {len(prot_sig)}')
print()
print(f'Union meeting either threshold: {n_total_meeting}')
print(f'H4 VERDICT (≥3 supported / 2 borderline / <2 rejected): {h4_verdict}')
print()
if h4_verdict == 'SUPPORTED':
print('PG remodeling IS engaged as part of the Caulobacter response to lipid A loss.')
print('This is mechanistically distinct from but biologically analogous to the Kang')
print('2021 A. baumannii PBP1A/Ldt route — both organisms reorganize PG when LPS/LOS')
print('is removed, but via different enzymatic specifics.')
=== NB05 SUMMARY === Pre-registered PG-remodeling gene set: 53 loci (locked before DE) via gene symbol: 7 via description regex: 52 Transcript-significant changes in 4599-vs-4584 (FDR<0.05): 25 UP: 5, DOWN: 20 OM proteome |log2(4672/4659)|>1 hits: 6 Union meeting either threshold: 28 H4 VERDICT (≥3 supported / 2 borderline / <2 rejected): SUPPORTED PG remodeling IS engaged as part of the Caulobacter response to lipid A loss. This is mechanistically distinct from but biologically analogous to the Kang 2021 A. baumannii PBP1A/Ldt route — both organisms reorganize PG when LPS/LOS is removed, but via different enzymatic specifics.