Nb12 Phage Targetability
Jupyter notebook from the Metagenome-Prioritized Phage Cocktails for Crohn's Disease and IBD project.
NB12 — Pathobiont × phage targetability matrix (Pillar 4 opener)¶
Project: ibd_phage_targeting — Pillar 4 first notebook
Depends on: ref_phage_biology (12-organism literature-curated phage synthesis); NB05 actionable Tier-A scoring; Pillar-3 per-target mechanism profile (iron / bile-acid / mediation)
Purpose¶
Build a per-pathobiont phage-targetability profile (Tier-B in the project's 4-tier rubric) for the 6 actionable Tier-A core species, combining NB05 Tier-A scoring (criteria A3-A6) with literature-curated phage availability + Pillar-3 mechanism profile to produce per-target priority classification for Pillar-5 cocktail design.
Method¶
Per plan v1.9 (no raw reads):
ref_phage_biology(12 organisms, indicator_taxa_literature_review) — literature-curated phage info per Tier-1/Tier-2 pathobiont (known_phages, therapeutic_targets, lifestyle, clinical trial status)- NB05 Tier-A scoring (71 candidates × A3-A6 criteria + total_score)
- Pillar-3 per-target mechanism profile (iron specialization, bile-acid coupling cost, mechanism mediation — from REPORT §closure cocktail-design table)
- Phage-availability score (Tier-B): 0-3 ordinal scale
- 0 = no known phages OR only historical phage-like particles
- 1 = temperate / prophage only, or limited characterization
- 2 = lytic phage(s) characterized in literature, but not in clinical trials
- 3 = clinical trial / commercial cocktail OR published efficacy data
External phage DB queries (PhageFoundry BERDL, INPHARED, IMG/VR, NCBI Phage RefSeq, PhagesDB) flagged for follow-up — BERDL Spark auth currently blocks direct PhageFoundry query, so this NB12 establishes the curated-literature-based foundation.
Tests¶
- Per-pathobiont phage-availability score
- Combined Tier-A × Tier-B × Pillar-3 mechanism profile → Pillar-5 cocktail-design priority class
- Coverage gap analysis: which actionable Tier-A core have NO phage-therapy options?
- External phage DB references for follow-up
# See run_nb12.py for full source.
§0. Load ref_phage_biology + NB05 Tier-A scoring + Pillar-3 mechanism profile¶
# 12 organisms in ref_phage_biology; 6 actionable + 9 Tier-B candidates from NB05
## §0. Load ref_phage_biology + NB05 Tier-A scoring + Pillar-3 mechanism profile
ref_phage_biology: 12 organisms × 8 columns
Tier breakdown: {'Tier1': 6, 'Tier2': 6}
NB05 actionable Tier-A core: 6 species
Hungatella hathewayi: total_score=4.0
Mediterraneibacter gnavus: total_score=3.8
Escherichia coli: total_score=3.6
Eggerthella lenta: total_score=3.3
Flavonifractor plautii: total_score=3.3
Enterocloster bolteae: total_score=2.8
NB05 Tier-B candidates (score 2.2-2.4):
Enterocloster asparagiformis: total_score=2.4
Streptococcus salivarius: total_score=2.4
Enterocloster citroniae: total_score=2.4
Enterocloster clostridioformis: total_score=2.4
Blautia coccoides: total_score=2.4
Veillonella atypica: total_score=2.2
Streptococcus parasanguinis: total_score=2.2
Actinomyces oris: total_score=2.2
Veillonella parvula: total_score=2.2
§1. Per-pathobiont phage-availability scoring (Tier-B)¶
# 0-3 ordinal scale based on lifestyle + clinical status
## §1. Per-pathobiont phage-availability score
Phage-availability scoring (top 20 by Tier-A score):
species tier_a_score_nb05 actionable_tier_a phage_score_tier_b lifestyle_dominant
Hungatella hathewayi 4.0 True 0.0 none
Mediterraneibacter gnavus 3.8 True 1.0 temperate
Escherichia coli 3.6 True 3.0 lytic_clinical
Eggerthella lenta 3.3 True 2.0 lytic
Flavonifractor plautii 3.3 True NaN unknown_not_in_ref_phage_biology
Enterocloster bolteae 2.8 True 2.0 lytic
Enterocloster asparagiformis 2.4 False 1.0 temperate_only
Streptococcus salivarius 2.4 False NaN unknown_not_in_ref_phage_biology
Enterocloster citroniae 2.4 False NaN unknown_not_in_ref_phage_biology
Enterocloster clostridioformis 2.4 False NaN unknown_not_in_ref_phage_biology
Blautia coccoides 2.4 False NaN unknown_not_in_ref_phage_biology
Veillonella atypica 2.2 False NaN unknown_not_in_ref_phage_biology
Streptococcus parasanguinis 2.2 False NaN unknown_not_in_ref_phage_biology
Actinomyces oris 2.2 False NaN unknown_not_in_ref_phage_biology
Veillonella parvula 2.2 False NaN unknown_not_in_ref_phage_biology
Hungatella symbiosa 2.1 False NaN unknown_not_in_ref_phage_biology
Blautia wexlerae 2.1 False NaN unknown_not_in_ref_phage_biology
Bacteroides cellulosilyticus 2.0 False NaN unknown_not_in_ref_phage_biology
Veillonella dispar 2.0 False NaN unknown_not_in_ref_phage_biology
Streptococcus mitis 1.9 False NaN unknown_not_in_ref_phage_biology
§2. Combined Tier-A × Tier-B × Pillar-3 priority for Pillar-5 cocktail design¶
# Per-actionable-target full profile + Pillar-5 priority class
## §2. Combined per-target priority for Pillar-5 cocktail design
Combined per-actionable-target priority (NB05 actionable Tier-A only):
Hungatella hathewayi | NB05=4.0 | phage=0 | none
iron: none; BA-cost: low; mediation: species-abundance + within-carrier-metabolic-shift
Pillar-5 class: Tier-1 phage target (highest NB05 score; low BA cost)
Mediterraneibacter gnavus | NB05=3.8 | phage=1 | temperate
iron: none; BA-cost: low; mediation: species-abundance (mucin-glucorhamnan; Henke 2019)
Pillar-5 class: Tier-1 phage target (low BA cost; mucin mechanism)
Escherichia coli | NB05=3.6 | phage=3 | lytic_clinical
iron: dominant (Yersiniabactin/Enterobactin/Colibactin); BA-cost: low; mediation: strain-content (AIEC subset specific)
Pillar-5 class: Tier-1 phage target with strain-resolution requirement
Eggerthella lenta | NB05=3.3 | phage=2 | lytic
iron: none; BA-cost: moderate (partial 7α-dehydroxylator); mediation: drug-metabolism (Koppel 2018 Cgr2)
Pillar-5 class: Tier-2 phage target (moderate BA cost; non-BGC drug-metabolism mechanism)
Flavonifractor plautii | NB05=3.3 | phage=? | unknown_not_in_ref_phage_biology
iron: weak; BA-cost: HIGHEST (active 7α-dehydroxylator); mediation: species-abundance (NB10a F. plautii informative null)
Pillar-5 class: Tier-2 phage target with HIGHEST BA-coupling cost — depletion shifts BA pool toward primary tauro-conjugated forms; consider co-administering UDCA / BA-binding agent
Enterocloster bolteae | NB05=2.8 | phage=2 | lytic
iron: none; BA-cost: moderate (active 7α-dehydroxylator, NB09c × deoxycholate=+0.17); mediation: mixed
Pillar-5 class: Tier-2 phage target (moderate BA cost)
§3. Coverage gap analysis¶
# Pathobionts with phage-availability score = 0 or unknown
## §3. Coverage gap analysis Pathobionts with phage_score = 0 or unknown (no actionable phage options in current scope): Hungatella hathewayi [ACTIONABLE] (NB05=4.0) Flavonifractor plautii [ACTIONABLE] (NB05=3.3) Streptococcus salivarius (NB05=2.4) Enterocloster citroniae (NB05=2.4) Enterocloster clostridioformis (NB05=2.4) Blautia coccoides (NB05=2.4) Veillonella atypica (NB05=2.2) Streptococcus parasanguinis (NB05=2.2) Actinomyces oris (NB05=2.2) Veillonella parvula (NB05=2.2) Hungatella symbiosa (NB05=2.1) Blautia wexlerae (NB05=2.1) Bacteroides cellulosilyticus (NB05=2.0) Veillonella dispar (NB05=2.0) Streptococcus mitis (NB05=1.9) Streptococcus oralis (NB05=1.9) Lactococcus lactis (NB05=1.9) Gordonibacter pamelaeae (NB05=1.9) Streptococcus vestibularis (NB05=1.9) Eisenbergiella massiliensis (NB05=1.9) Bifidobacterium dentium (NB05=1.9) Alistipes onderdonkii (NB05=1.9) Streptococcus sanguinis (NB05=1.9) Streptococcus mutans (NB05=1.9) Streptococcus australis (NB05=1.9) Pseudoflavonifractor sp. An184 (NB05=1.9) Anaerotruncus colihominis (NB05=1.7) Bifidobacterium breve (NB05=1.7) Gemella sanguinis (NB05=1.7) Sellimonas intestinalis (NB05=1.7) Streptococcus thermophilus (NB05=1.7) Intestinibacter bartlettii (NB05=1.7) Clostridium paraputrificum (NB05=1.7) Butyricicoccus pullicaecorum (NB05=1.7) Enterococcus faecalis (NB05=1.7) Erysipelatoclostridium ramosum (NB05=1.6) Erysipelatoclostridium innocuum (NB05=1.6) Clostridium spiroforme (NB05=1.5) Prevotella copri (NB05=1.5) Actinomyces odontolyticus (NB05=1.4) Actinomyces sp. HPA0247 (NB05=1.4) Adlercreutzia caecimuris (NB05=1.4) Collinsella intestinalis (NB05=1.4) Blautia sp. CAG:257 (NB05=1.4) Actinomyces sp. ICM47 (NB05=1.4) Dorea sp. CAG:317 (NB05=1.4) Enterocloster aldenensis (NB05=1.4) Clostridium scindens (NB05=1.3) Aeriscardovia aeriphila (NB05=1.2) Actinomyces sp. HMSC035G02 (NB05=1.2) Turicimonas muris (NB05=1.2) Roseburia sp. CAG:471 (NB05=1.2) Veillonella infantium (NB05=1.2) Firmicutes bacterium CAG:424 (NB05=1.2) Clostridium sp. CAG:242 (NB05=1.2) Dielma fastidiosa (NB05=1.2) Anaerostipes hadrus (NB05=1.2) Butyricimonas synergistica (NB05=1.0) Eubacterium sp. CAG:251 (NB05=1.0) Anaerostipes caccae (NB05=1.0) Bacteroides stercoris (NB05=1.0) Eubacterium sp. CAG:38 (NB05=1.0) Eubacterium sp. CAG:274 (NB05=1.0) Romboutsia ilealis (NB05=1.0) Holdemanella biformis (NB05=1.0) Roseburia faecis (NB05=0.5) # actionable Tier-A core with phage coverage gap: 2 ### Specific gaps in actionable Tier-A core: - ***F. plautii***: NOT in ref_phage_biology curated set; literature search required (INPHARED / IMG/VR / NCBI Phage RefSeq / PhagesDB). Tier-2 priority for Pillar-5 cocktail design due to HIGH BA-coupling cost; the bile-acid-cost annotation is more important than phage availability for this species — first decision is whether to target F. plautii at all (BA pool consequence) before phage selection. - ***H. hathewayi***: ref_phage_biology entry says "No specific phages identified; historical phage-like particles but no plaques" — CRITICAL COVERAGE GAP for the highest-NB05-scored Tier-A. External DB query required: search INPHARED / IMG/VR for any Hungatella-host phages. Also possible: target the GAG-degrading enzyme directly (per ref_phage_biology therapeutic_targets) rather than via phage. - ***M. gnavus*** (= R. gnavus): all 6 known phages are TEMPERATE — phage therapy is structurally limited because temperate phages can confer host fitness benefits (lysogeny) rather than reliably lyse. Therapeutic strategies: (a) engineer lytic-locked variants of the temperate phages; (b) target glucorhamnan synthesis biochemically (per ref_phage_biology therapeutic_targets) rather than via phage.
§4. External phage DB references for follow-up¶
# PhageFoundry, INPHARED, IMG/VR, NCBI Phage RefSeq, PhagesDB
## §4. External phage DB references (out-of-BERDL — Pillar 4 follow-up) - **PhageFoundry (BERDL)** Access: BERDL Spark Connect — phagefoundry_strain_modelling, phagefoundry_ecoliphages_genomedepot, phagefoundry_klebsiella_*, phagefoundry_acinetobacter_*, phagefoundry_paeruginosa_*, phagefoundry_pviridiflava_* Coverage: E. coli direct (phagefoundry_ecoliphages_genomedepot); K. oxytoca direct (phagefoundry_klebsiella_*). Other Tier-A core (gut commensal hosts) not directly covered by current PhageFoundry collections. Status: BLOCKED at NB12 execution: BERDL auth token in .env stale (KBASE_AUTH_TOKEN reports invalid). Refresh token + re-query as Pillar-4 follow-up. - **Millard lab INPHARED** Access: http://millardlab.org/phages/inphared/ (downloadable phage genome annotations + host predictions) Coverage: Comprehensive — ~25K phage genomes with GenBank-quality annotations. Host predictions via BLAST + phylogenetic placement. Best single source for phage availability across all Tier-A pathobionts. Status: Out-of-BERDL — manual download + parse required. Promote to NB12-followup for the 3 actionable Tier-A coverage gaps (F. plautii, H. hathewayi, R. gnavus lytic alternatives). - **IMG/VR v4** Access: https://genome.jgi.doe.gov/portal/IMG_VR/IMG_VR.home.html (~3M uncultivated viral genomes from metagenomes, with host predictions via CRISPR spacer matches and BLAST) Coverage: Strongest for uncultivated phages of gut-anaerobe hosts where culturing-based isolation has failed (H. hathewayi, F. plautii). UViGs in IMG/VR may include phages with CRISPR-spacer-derived host predictions. Status: Out-of-BERDL — JGI Portal API or direct download. Promote to NB12-followup. - **NCBI Phage Virus RefSeq** Access: NCBI Virus / RefSeq Viral; ~5K curated phage reference genomes Coverage: Curated subset — covers well-studied phages (E. coli, Klebsiella, Salmonella, Pseudomonas) but limited gut-anaerobe coverage. Status: Out-of-BERDL — straightforward NCBI E-utils query. - **PhagesDB** Access: https://phagesdb.org/ (Mycobacterium-phage focused; ~25K isolated phages) Coverage: Mostly mycobacteriophages; not directly relevant for IBD pathobionts. Status: Low priority for this project.
§5. Verdict + figure¶
# 2-panel: Tier-A × Tier-B scatter + per-actionable Pillar-5 priority bar
## §5. Save outputs + verdict
{
"date": "2026-04-25",
"plan_version": "v1.9",
"test": "NB12 \u2014 Pathobiont \u00d7 phage targetability matrix (Pillar 4 opener)",
"n_actionable_tier_a": 6,
"n_phage_score_3_clinical": 1,
"n_phage_score_2_lytic_literature": 2,
"n_phage_score_1_temperate_or_limited": 1,
"n_phage_score_0_gap": 1,
"phage_clinical_tier_a": [
"Escherichia coli"
],
"phage_lytic_literature_tier_a": [
"Eggerthella lenta",
"Enterocloster bolteae"
],
"phage_limited_tier_a": [
"Mediterraneibacter gnavus"
],
"phage_gap_tier_a": [
"Flavonifractor plautii",
"Hungatella hathewayi"
],
"pillar4_pillar5_handoff_note": "Phage availability stratifies the 6 actionable Tier-A core into 4 classes: clinical-trial-stage (E. coli AIEC); lytic-literature (E. lenta, E. bolteae); temperate-limited (M. gnavus); coverage-gap (H. hathewayi, F. plautii). The 2 coverage-gap targets are the highest-NB05-scored species and require external phage DB queries (INPHARED + IMG/VR) as Pillar-4 follow-up. F. plautii additionally carries the highest BA-coupling cost \u2014 phage targeting may be deprioritized in favor of bile-acid-pool monitoring or biochemical-target alternatives. M. gnavus temperate-only constraint may require either lytic-locked phage engineering or biochemical glucorhamnan-synthesis targeting.",
"limitations": [
"BERDL Spark auth blocked at NB12 execution (KBASE_AUTH_TOKEN stale); PhageFoundry collections not directly queried. Refresh token + cross-check ref_phage_biology curated synthesis as a Pillar-4 follow-up.",
"External phage DB queries (INPHARED / IMG/VR / NCBI Phage RefSeq) are out-of-BERDL and not run in NB12. Coverage gaps for F. plautii / H. hathewayi specifically require these queries.",
"ref_phage_biology has 12 organisms \u2014 F. plautii is NOT in the curated set (only 5 of 6 actionable Tier-A core covered).",
"Phage-availability scoring is qualitative (0-3 ordinal scale based on lifestyle + clinical status) \u2014 quantitative coverage metrics (n_phages, host-range CDS, receptor-binding-domain diversity) require PhageFoundry / INPHARED genomic data."
]
}
Wrote /home/aparkin/BERIL-research-observatory-ibd/projects/ibd_phage_targeting/figures/NB12_phage_targetability.png
§6. Interpretation¶
Headline: Phage availability stratifies the 6 actionable Tier-A core into 4 priority classes; H. hathewayi (highest NB05) and F. plautii (highest BA-cost) are coverage gaps requiring external DB queries¶
Per-actionable Tier-A phage-targetability profile¶
| Pathobiont | NB05 score | Phage score | Lifestyle | BA cost | Pillar-5 class |
|---|---|---|---|---|---|
| H. hathewayi | 4.0 | 0 | none | low | GAP: highest NB05 but no known phages — external DB query (INPHARED + IMG/VR) priority |
| M. gnavus | 3.8 | 1 | temperate | low | Limited: 6 known phages all temperate — lytic-locked engineering OR biochemical glucorhamnan-synthesis target as alternatives |
| E. coli (AIEC) | 3.6 | 3 | lytic + clinical | low | Tier-1 clinical: EcoActive cocktail (7 lytic phages, clinical trials); HER259 (FimH-targeting, attenuates virulence). Most advanced. Strain-resolution requirement (AIEC subset) per NB07b/NB08a |
| E. lenta | 3.3 | 2 | lytic literature | moderate | Tier-2: PMBT5 siphovirus characterized; non-BGC drug-metabolism mechanism (Koppel 2018 Cgr2) — moderate-priority target |
| F. plautii | 3.3 | 0 | unknown (not in ref) | HIGHEST | GAP + HIGH cost: not in ref_phage_biology; HIGHEST BA-coupling cost (active 7α-dehydroxylator). Phage targeting deprioritized in favor of bile-acid-pool monitoring or biochemical alternatives |
| E. bolteae | 2.8 | 2 | lytic literature | moderate | Tier-2: PMBT24 (virulent, 99,962 bp Kielviridae) — best-characterized lytic phage among gut-anaerobe Tier-A |
Stratification — four phage-availability classes among 6 actionable Tier-A:¶
- Class 3 (clinical trial): 1 species — E. coli (EcoActive cocktail; Galtier 2017 mouse model precedent)
- Class 2 (lytic literature): 2 species — E. lenta (PMBT5), E. bolteae (PMBT24)
- Class 1 (temperate / limited): 1 species — M. gnavus (6 temperate siphoviruses; lifestyle limits therapy)
- Class 0 (gap): 2 species — H. hathewayi (no specific phages identified), F. plautii (not in ref_phage_biology)
Critical observations¶
The 2 highest-NB05-scored species (H. hathewayi 4.0, M. gnavus 3.8) have the WEAKEST phage availability. Phage-therapy success requires resolving these gaps (INPHARED / IMG/VR for H. hathewayi; lytic-locked phage engineering or biochemical alternatives for M. gnavus).
***F. plautii* has both phage GAP AND highest BA-coupling cost** (NB09c §13: active 7α-dehydroxylator, depletion shifts BA pool toward inflammatory primary tauro-conjugated forms). This makes F. plautii a lowest-priority Pillar-5 target despite NB05 score 3.3 — first decision is whether to target at all (BA pool consequence) before phage selection. Alternative strategies: (a) co-administer UDCA / BA-binding agent; (b) target downstream of F. plautii (e.g., bile-acid-pool replenishment); (c) accept partial F. plautii depletion with clinical BA monitoring.
***E. coli* AIEC is the highest-Pillar-5-feasibility target**: clinical-trial-stage phage cocktail (EcoActive — 7 lytic phages), low BA-coupling cost, mechanism well-characterized (NB05 §5g + NB07c §2 + NB08a §2 iron-acquisition narrative). The Pillar-4-feasibility decision for E. coli is sharpened by the AIEC strain-content requirement: target Yersiniabactin/Enterobactin/Colibactin-positive strains specifically (per NB08a) — generic E. coli phages may not deplete the right subset.
***E. bolteae* + E. lenta are mid-tier targets** with lytic literature phages and moderate BA-coupling cost. Both are realistic Pillar-5 phage-cocktail components subject to BA pool monitoring.
Pillar 4 → Pillar 5 hand-off framework¶
The 6 actionable Tier-A core stratify into 3 Pillar-5 design strategies:
- Direct phage targeting (Tier-1): E. coli (AIEC subset, clinical-trial cocktail) → use EcoActive or build similar 7-phage cocktail; require strain-resolution diagnostic.
- Phage targeting with monitoring (Tier-2): E. lenta (PMBT5), E. bolteae (PMBT24) → include in cocktail with BA-pool monitoring; M. gnavus is here too if lytic-locked engineering succeeds.
- Phage GAP — alternative strategies needed:
- H. hathewayi: highest priority for external DB query (INPHARED / IMG/VR) — if no phages found, fall back to GAG-degrading enzyme inhibitors per ref_phage_biology therapeutic_targets.
- F. plautii: lowest Pillar-5 priority due to highest BA-cost — consider deprioritizing or replacing phage approach with BA-binding co-therapy.
- M. gnavus if lytic-locked engineering fails: biochemical glucorhamnan-synthesis targets (Henke 2019).
Limitations¶
- BERDL Spark auth blocked at NB12 execution (KBASE_AUTH_TOKEN stale) — PhageFoundry collections (
phagefoundry_strain_modelling,phagefoundry_ecoliphages_genomedepot,phagefoundry_klebsiella_*) not directly queried. Refresh token + re-query as Pillar-4 follow-up. Direct PhageFoundry coverage would primarily augment E. coli (genomes + host-range CDS + receptor-binding-domain diversity) and K. oxytoca (Tier-2). - External phage DB queries (INPHARED ~25K phages with host predictions; IMG/VR ~3M UViGs from metagenomes; NCBI Phage RefSeq ~5K curated) are out-of-BERDL and not run in NB12. The 2 actionable Tier-A coverage gaps (F. plautii, H. hathewayi) require these queries — this is the highest-priority Pillar-4 follow-up.
ref_phage_biologyhas 12 organisms — F. plautii not in the curated set (5 of 6 actionable Tier-A core covered).- Phage-availability scoring is qualitative ordinal (0-3) based on lifestyle + clinical status. Quantitative coverage metrics (n_phages, host-range CDS, receptor-binding-domain diversity, plaque burst size, pH stability) require PhageFoundry / INPHARED genomic data.
Outputs¶
data/nb12_phage_targetability_matrix.tsv— per-pathobiont scoring matrix (NB05 Tier-A score + phage Tier-B score + lifestyle + Pillar-5 class)data/nb12_phage_targetability_verdict.json— formal verdict + Pillar-4/5 hand-off note + limitationsfigures/NB12_phage_targetability.png— 2-panel: Tier-A × Tier-B scatter + per-actionable Pillar-5 priority bar