06B Ncbi Annotation Presence
Jupyter notebook from the Caulobacter Fur–Lipid A Loss project.
06b — Comparative arm: NCBI annotation-based presence/absence (post-review)¶
Project: caulobacter_fur_lipida_loss — Phase C, NB06b. Addresses adversarial review I4.
Purpose¶
NB06 used PaperBLAST description-text matching and was criticized in adversarial review I4 for a ~50% false-negative rate on known Caulobacter essential genes (LpxA, LpxC, LptA, LptB, LptD, LptE all scored 0 in PaperBLAST despite being unambiguously present in C. crescentus). The RESEARCH_PLAN v2 specified an NCBI BLAST/HMMER fallback against named accessions. NB06b implements an annotation-based variant of that fallback: rather than running BLAST locally (no hmmsearch / datasets CLI on JupyterHub), it queries NCBI's protein database directly through Biopython Entrez for each focal gene/function across the four species' reference genomes. NCBI annotation is the most curated source available without local sequence-search infrastructure.
Methodology¶
For each focal gene/function and each species' reference taxonomy ID, query NCBI's protein database with combined gene-symbol and functional-name patterns. Count hits. A gene is "present" if NCBI annotation returns at least 1 hit in the species' reference proteome.
| Species | Taxonomy ID | Reference strain |
|---|---|---|
| C. crescentus | 565050 | NA1000 |
| A. baumannii | 470 | ATCC 17978 (and others) |
| N. meningitidis | 487 | MC58 (and others) |
| M. catarrhalis | 480 | BBH18 (and others) |
Caveats¶
- Annotation-based search misses genes that are present but unannotated. For well-studied reference genomes (the four above), annotation gaps are uncommon for canonical pathway enzymes.
- Some queries return many hits across paralog families (e.g., transglycosylase families); counts >0 are interpreted as "present" without distinguishing paralog count.
- Network reliability: Entrez occasionally fails. Failures are recorded as
NaNrather than0.
import pandas as pd
import numpy as np
from pathlib import Path
import time
from Bio import Entrez
# REQUIRED by NCBI: identify yourself
Entrez.email = '[email protected]'
PROJ = Path('/home/aparkin/BERIL-research-observatory/projects/caulobacter_fur_lipida_loss')
DATA = PROJ / 'data'
SPECIES_TAXIDS = {
'C_crescentus': '565050[Organism:exp] OR "Caulobacter vibrioides"[Organism:exp] OR "Caulobacter crescentus"[Organism:exp]',
'A_baumannii': 'txid470[Organism:exp]',
'N_meningitidis': 'txid487[Organism:exp]',
'M_catarrhalis': 'txid480[Organism:exp]',
}
# Focal gene/function families to query
# Each entry: (label, query_terms_list)
FOCAL = [
('spt — serine palmitoyltransferase',
['"serine palmitoyltransferase"', 'spt[Gene]']),
('bcerS — bacterial ceramide synthase',
['"ceramide synthase"', '"bacterial ceramide synthase"', 'bcers[Gene]']),
('cerR — ceramide reductase',
['"ceramide reductase"', 'cerR[Gene]']),
('sphingosine kinase (sphk)',
['"sphingosine kinase"', 'sphk[Gene]']),
('LpxA — UDP-GlcNAc acyltransferase',
['lpxA[Gene]', '"UDP-N-acetylglucosamine acyltransferase"']),
('LpxC — UDP-GlcNAc deacetylase',
['lpxC[Gene]', '"UDP-3-O-acyl-N-acetylglucosamine deacetylase"', '"UDP-3-O-(3-hydroxymyristoyl)-N-acetylglucosamine deacetylase"']),
('LpxD — UDP-3-O-N-acyltransferase',
['lpxD[Gene]', '"UDP-3-O-acylglucosamine N-acyltransferase"']),
('LpxB — disaccharide synthase',
['lpxB[Gene]', '"lipid A disaccharide synthase"']),
('LpxK — tetraacyldisaccharide kinase',
['lpxK[Gene]', '"tetraacyldisaccharide 4\'-kinase"']),
('MsbA — LPS flippase',
['msbA[Gene]', '"lipid A flippase"', '"lipid A export ATP-binding"']),
('LptA — periplasmic LPS carrier',
['lptA[Gene]', '"LPS-assembly protein LptA"']),
('LptB — ATPase',
['lptB[Gene]', '"LPS-export ATP-binding"']),
('LptC',
['lptC[Gene]', '"LPS-assembly protein LptC"']),
('LptD — OM assembly',
['lptD[Gene]', '"LPS-assembly protein LptD"']),
('LptE',
['lptE[Gene]', '"LPS-assembly lipoprotein LptE"']),
('LptFG (any)',
['lptF[Gene]', 'lptG[Gene]', '"LPS export ABC transporter permease"']),
('Fur — ferric uptake regulator',
['fur[Gene]', '"ferric uptake regulator"', '"ferric uptake regulation"']),
('ChvG — sensor kinase',
['chvG[Gene]', '"sensor histidine kinase ChvG"', '"ExoS"[Gene]']),
('ChvI — response regulator',
['chvI[Gene]', '"two-component system response regulator ChvI"', 'exoR[Gene]']),
('CtpA / Prc — C-terminal processing protease',
['ctpA[Gene]', 'prc[Gene]', '"carboxyl-terminal protease"', '"C-terminal processing protease"']),
('PBP1A / MrcA',
['mrcA[Gene]', 'pbpA[Gene]', '"penicillin-binding protein 1A"']),
('LD-transpeptidase (LdtJ/K-class)',
['ldtJ[Gene]', 'ldtK[Gene]', '"L,D-transpeptidase"', '"ld-transpeptidase"']),
('Capsule biosynthesis (siaD/cps)',
['siaD[Gene]', '"capsular polysaccharide biosynthesis"']),
('Polysialyltransferase',
['"polysialyltransferase"']),
('Late acyltransferase (lpxX/lpxL)',
['lpxX[Gene]', 'lpxL[Gene]', '"lauroyl-Kdo2-lipid A acyltransferase"', '"myristoyl-Kdo2-lipid A acyltransferase"']),
('Tol-Pal pal',
['pal[Gene]', '"peptidoglycan-associated lipoprotein"']),
('Tol-Pal tolA',
['tolA[Gene]', '"Tol-Pal system protein TolA"']),
('Tol-Pal tolB',
['tolB[Gene]', '"Tol-Pal system protein TolB"']),
]
print(f'Focal families: {len(FOCAL)}')
print(f'Species: {list(SPECIES_TAXIDS.keys())}')
Focal families: 28 Species: ['C_crescentus', 'A_baumannii', 'N_meningitidis', 'M_catarrhalis']
def search_count(query, retries=3, delay=0.4):
"""esearch with retry — returns int or NaN on persistent failure."""
for attempt in range(retries):
try:
handle = Entrez.esearch(db='protein', term=query, retmax=0)
res = Entrez.read(handle)
handle.close()
time.sleep(delay)
return int(res['Count'])
except Exception as e:
if attempt == retries - 1:
print(f' ! FAIL "{query[:80]}": {type(e).__name__}: {e}')
return float('nan')
time.sleep(1)
return float('nan')
rows = []
for label, terms in FOCAL:
row = {'family': label}
for sp_key, sp_filter in SPECIES_TAXIDS.items():
# Combine all terms with OR, scope by species
combined = '(' + ' OR '.join(terms) + ') AND (' + sp_filter + ')'
n = search_count(combined)
row[sp_key] = n
rows.append(row)
# Per-row progress
print(f'{label:55s} C={row.get("C_crescentus","?")} Ab={row.get("A_baumannii","?")} Nm={row.get("N_meningitidis","?")} Mc={row.get("M_catarrhalis","?")}')
ncbi = pd.DataFrame(rows)
ncbi.to_csv(DATA / 'NB06b_ncbi_presence_counts.csv', index=False)
print(f'\nSaved data/NB06b_ncbi_presence_counts.csv')
! FAIL "("serine palmitoyltransferase" OR spt[Gene]) AND (txid480[Organism:exp])": RuntimeError: Search Backend failed:
spt — serine palmitoyltransferase C=7 Ab=0 Nm=0 Mc=nan
! FAIL "("ceramide synthase" OR "bacterial ceramide synthase" OR bcers[Gene]) AND (56505": RuntimeError: Search Backend failed:
bcerS — bacterial ceramide synthase C=nan Ab=0 Nm=0 Mc=0
cerR — ceramide reductase C=6 Ab=0 Nm=0 Mc=0
sphingosine kinase (sphk) C=0 Ab=287 Nm=0 Mc=0
LpxA — UDP-GlcNAc acyltransferase C=11 Ab=2607 Nm=791 Mc=78
LpxC — UDP-GlcNAc deacetylase C=15 Ab=4459 Nm=1189 Mc=140
LpxD — UDP-3-O-N-acyltransferase C=15 Ab=3063 Nm=1031 Mc=115
LpxB — disaccharide synthase C=18 Ab=4471 Nm=1232 Mc=154
LpxK — tetraacyldisaccharide kinase C=18 Ab=4476 Nm=1717 Mc=174
MsbA — LPS flippase C=0 Ab=2871 Nm=1025 Mc=154
LptA — periplasmic LPS carrier C=0 Ab=2857 Nm=966 Mc=92
LptB — ATPase C=10 Ab=2254 Nm=828 Mc=78
LptC C=11 Ab=2074 Nm=304 Mc=80
LptD — OM assembly C=11 Ab=2444 Nm=602 Mc=115
LptE C=5 Ab=700 Nm=468 Mc=21
! FAIL "(lptF[Gene] OR lptG[Gene] OR "LPS export ABC transporter permease") AND (txid470": RuntimeError: Search Backend failed:
LptFG (any) C=21 Ab=nan Nm=1625 Mc=199
! FAIL "(fur[Gene] OR "ferric uptake regulator" OR "ferric uptake regulation") AND (txid": RuntimeError: Search Backend failed:
Fur — ferric uptake regulator C=10 Ab=nan Nm=917 Mc=100
! FAIL "(chvG[Gene] OR "sensor histidine kinase ChvG" OR "ExoS"[Gene]) AND (txid487[Orga": RuntimeError: Search Backend failed:
ChvG — sensor kinase C=5 Ab=0 Nm=nan Mc=0
ChvI — response regulator C=6 Ab=0 Nm=0 Mc=0
CtpA / Prc — C-terminal processing protease C=21 Ab=4801 Nm=1207 Mc=148
PBP1A / MrcA C=54 Ab=466 Nm=982 Mc=64
LD-transpeptidase (LdtJ/K-class) C=29 Ab=5267 Nm=996 Mc=170
Capsule biosynthesis (siaD/cps) C=64 Ab=1723 Nm=3019 Mc=20
Polysialyltransferase C=0 Ab=540 Nm=929 Mc=0
! FAIL "(lpxX[Gene] OR lpxL[Gene] OR "lauroyl-Kdo2-lipid A acyltransferase" OR "myristoy": RuntimeError: Search Backend failed:
Late acyltransferase (lpxX/lpxL) C=nan Ab=1302 Nm=0 Mc=6
Tol-Pal pal C=15 Ab=4377 Nm=0 Mc=138
! FAIL "(tolA[Gene] OR "Tol-Pal system protein TolA") AND (txid470[Organism:exp])": RuntimeError: Search Backend failed:
Tol-Pal tolA C=8 Ab=nan Nm=1 Mc=0
Tol-Pal tolB C=15 Ab=3258 Nm=1 Mc=3 Saved data/NB06b_ncbi_presence_counts.csv
Boolean presence and pattern analysis¶
# Boolean matrix
ncbi_bool = ncbi.copy()
for c in ['C_crescentus','A_baumannii','N_meningitidis','M_catarrhalis']:
ncbi_bool[c + '_present'] = (ncbi_bool[c] > 0).astype(int)
binary = ncbi_bool.set_index('family')[[c + '_present' for c in ['C_crescentus','A_baumannii','N_meningitidis','M_catarrhalis']]]
binary.columns = [c.replace('_present','') for c in binary.columns]
print('Boolean presence matrix (1=present, 0=absent in NCBI annotation):')
display(binary)
# Per-pattern summary
patterns = {}
for f in binary.index:
s = ''.join('1' if v else '0' for v in binary.loc[f])
patterns.setdefault(s, []).append(f)
print('\n=== Patterns (CANM = C.crescentus, A.baumannii, N.meningitidis, M.catarrhalis) ===')
for pat in sorted(patterns.keys(), key=lambda p: -sum(int(c) for c in p)):
print(f' {pat}: {patterns[pat]}')
ncbi_bool.to_csv(DATA / 'NB06b_ncbi_presence_bool.csv', index=False)
Boolean presence matrix (1=present, 0=absent in NCBI annotation):
| C_crescentus | A_baumannii | N_meningitidis | M_catarrhalis | |
|---|---|---|---|---|
| family | ||||
| spt — serine palmitoyltransferase | 1 | 0 | 0 | 0 |
| bcerS — bacterial ceramide synthase | 0 | 0 | 0 | 0 |
| cerR — ceramide reductase | 1 | 0 | 0 | 0 |
| sphingosine kinase (sphk) | 0 | 1 | 0 | 0 |
| LpxA — UDP-GlcNAc acyltransferase | 1 | 1 | 1 | 1 |
| LpxC — UDP-GlcNAc deacetylase | 1 | 1 | 1 | 1 |
| LpxD — UDP-3-O-N-acyltransferase | 1 | 1 | 1 | 1 |
| LpxB — disaccharide synthase | 1 | 1 | 1 | 1 |
| LpxK — tetraacyldisaccharide kinase | 1 | 1 | 1 | 1 |
| MsbA — LPS flippase | 0 | 1 | 1 | 1 |
| LptA — periplasmic LPS carrier | 0 | 1 | 1 | 1 |
| LptB — ATPase | 1 | 1 | 1 | 1 |
| LptC | 1 | 1 | 1 | 1 |
| LptD — OM assembly | 1 | 1 | 1 | 1 |
| LptE | 1 | 1 | 1 | 1 |
| LptFG (any) | 1 | 0 | 1 | 1 |
| Fur — ferric uptake regulator | 1 | 0 | 1 | 1 |
| ChvG — sensor kinase | 1 | 0 | 0 | 0 |
| ChvI — response regulator | 1 | 0 | 0 | 0 |
| CtpA / Prc — C-terminal processing protease | 1 | 1 | 1 | 1 |
| PBP1A / MrcA | 1 | 1 | 1 | 1 |
| LD-transpeptidase (LdtJ/K-class) | 1 | 1 | 1 | 1 |
| Capsule biosynthesis (siaD/cps) | 1 | 1 | 1 | 1 |
| Polysialyltransferase | 0 | 1 | 1 | 0 |
| Late acyltransferase (lpxX/lpxL) | 0 | 1 | 0 | 1 |
| Tol-Pal pal | 1 | 1 | 0 | 1 |
| Tol-Pal tolA | 1 | 0 | 1 | 0 |
| Tol-Pal tolB | 1 | 1 | 1 | 1 |
=== Patterns (CANM = C.crescentus, A.baumannii, N.meningitidis, M.catarrhalis) === 1111: ['LpxA — UDP-GlcNAc acyltransferase', 'LpxC — UDP-GlcNAc deacetylase', 'LpxD — UDP-3-O-N-acyltransferase', 'LpxB — disaccharide synthase', 'LpxK — tetraacyldisaccharide kinase', 'LptB — ATPase', 'LptC', 'LptD — OM assembly', 'LptE', 'CtpA / Prc — C-terminal processing protease', 'PBP1A / MrcA', 'LD-transpeptidase (LdtJ/K-class)', 'Capsule biosynthesis (siaD/cps)', 'Tol-Pal tolB'] 0111: ['MsbA — LPS flippase', 'LptA — periplasmic LPS carrier'] 1011: ['LptFG (any)', 'Fur — ferric uptake regulator'] 1101: ['Tol-Pal pal'] 0110: ['Polysialyltransferase'] 0101: ['Late acyltransferase (lpxX/lpxL)'] 1010: ['Tol-Pal tolA'] 1000: ['spt — serine palmitoyltransferase', 'cerR — ceramide reductase', 'ChvG — sensor kinase', 'ChvI — response regulator'] 0100: ['sphingosine kinase (sphk)'] 0000: ['bcerS — bacterial ceramide synthase']
Compare PaperBLAST vs NCBI annotation: false-negative diagnosis¶
# Load the original PaperBLAST presence/absence
nb06_pb = pd.read_csv(DATA / 'NB06_comparative_presence_counts.csv')
# Align family labels — for simplicity, match by substring of the gene short name
# Build comparison for genes where labels match approximately
def label_key(s):
# Take portion before "—" if present
return s.split('—')[0].strip().lower()
pb_map = {label_key(f): f for f in nb06_pb['family']}
ncbi_map = {label_key(f): f for f in ncbi['family']}
shared_keys = sorted(set(pb_map.keys()) & set(ncbi_map.keys()))
print(f'Comparable family labels (shared between PaperBLAST and NCBI panels): {len(shared_keys)}')
rows = []
for key in shared_keys:
pb_row = nb06_pb[nb06_pb['family'] == pb_map[key]].iloc[0]
nc_row = ncbi[ncbi['family'] == ncbi_map[key]].iloc[0]
for sp_key in ['C_crescentus','A_baumannii','N_meningitidis','M_catarrhalis']:
pb_present = pb_row[sp_key] > 0
nc_present = nc_row[sp_key] > 0
rows.append({'family': pb_map[key], 'species': sp_key,
'paperblast_n': int(pb_row[sp_key]), 'ncbi_n': int(nc_row[sp_key]) if not pd.isna(nc_row[sp_key]) else None,
'paperblast_present': pb_present, 'ncbi_present': nc_present,
'discordance': 'agree' if pb_present == nc_present else ('PB_miss' if (not pb_present and nc_present) else 'NCBI_miss')})
cmp = pd.DataFrame(rows)
# Focus on C. crescentus — where false negatives in PaperBLAST were specifically flagged
print('\n=== C. crescentus PaperBLAST vs NCBI annotation ===')
cc = cmp[cmp['species'] == 'C_crescentus']
print(cc.to_string(index=False))
print(f'\nC. crescentus families: {len(cc)}, agree={int((cc["discordance"]=="agree").sum())}, PaperBLAST missed (NCBI says present) = {int((cc["discordance"]=="PB_miss").sum())}, NCBI missed = {int((cc["discordance"]=="NCBI_miss").sum())}')
cmp.to_csv(DATA / 'NB06b_paperblast_vs_ncbi_comparison.csv', index=False)
Comparable family labels (shared between PaperBLAST and NCBI panels): 21
=== C. crescentus PaperBLAST vs NCBI annotation ===
family species paperblast_n ncbi_n paperblast_present ncbi_present discordance
bcerS — bacterial ceramide synthase C_crescentus 1 NaN True False NCBI_miss
Capsule biosynthesis (siaD/cps) C_crescentus 3 64.0 True True agree
cerR — ceramide reductase C_crescentus 1 6.0 True True agree
CtpA / Prc — C-terminal processing protease C_crescentus 0 21.0 False True PB_miss
Fur — ferric uptake regulator C_crescentus 4 10.0 True True agree
Late acyltransferase (lpxX/lpxL) C_crescentus 0 NaN False False agree
Ld-transpeptidase (LdtJ/K-class) C_crescentus 2 29.0 True True agree
LptA — periplasmic LPS carrier C_crescentus 0 0.0 False False agree
LptC C_crescentus 1 11.0 True True agree
LptD — OM assembly C_crescentus 0 11.0 False True PB_miss
LptE C_crescentus 0 5.0 False True PB_miss
LpxA — UDP-GlcNAc acyltransferase C_crescentus 0 11.0 False True PB_miss
LpxB — disaccharide synthase C_crescentus 1 18.0 True True agree
LpxC — UDP-GlcNAc deacetylase C_crescentus 0 15.0 False True PB_miss
LpxD — UDP-3-O-N-acyltransferase C_crescentus 0 15.0 False True PB_miss
LpxK — tetraacyldisaccharide kinase C_crescentus 0 18.0 False True PB_miss
MsbA — LPS flippase / ABC C_crescentus 3 0.0 True False NCBI_miss
PBP1A / MrcA — class A penicillin-binding C_crescentus 8 54.0 True True agree
Polysialyltransferase C_crescentus 0 0.0 False False agree
sphingosine kinase (sphk) C_crescentus 2 0.0 True False NCBI_miss
spt — serine palmitoyltransferase C_crescentus 2 7.0 True True agree
C. crescentus families: 21, agree=11, PaperBLAST missed (NCBI says present) = 7, NCBI missed = 3
Headline question — does NB06b confirm the original NB06 sphingolipid + ChvI uniqueness claims?¶
SPHINGO = ['spt — serine palmitoyltransferase',
'bcerS — bacterial ceramide synthase',
'cerR — ceramide reductase',
'sphingosine kinase (sphk)']
CHV = ['ChvG — sensor kinase', 'ChvI — response regulator']
LPT = ['LpxA — UDP-GlcNAc acyltransferase','LpxC — UDP-GlcNAc deacetylase','LpxD — UDP-3-O-N-acyltransferase',
'LpxB — disaccharide synthase','LpxK — tetraacyldisaccharide kinase',
'MsbA — LPS flippase','LptA — periplasmic LPS carrier','LptB — ATPase','LptC','LptD — OM assembly','LptE','LptFG (any)']
def arm_view(loc_list, label):
sub = binary.loc[[f for f in loc_list if f in binary.index]]
print(f'\n=== {label} ===')
print(sub.to_string())
arm_view(SPHINGO, 'Sphingolipid biosynthesis')
arm_view(CHV, 'ChvG-ChvI envelope-stress TCS')
arm_view(LPT, 'Lipid A biosynthesis + Lpt apparatus (false-negative check)')
=== Sphingolipid biosynthesis ===
C_crescentus A_baumannii N_meningitidis M_catarrhalis
family
spt — serine palmitoyltransferase 1 0 0 0
bcerS — bacterial ceramide synthase 0 0 0 0
cerR — ceramide reductase 1 0 0 0
sphingosine kinase (sphk) 0 1 0 0
=== ChvG-ChvI envelope-stress TCS ===
C_crescentus A_baumannii N_meningitidis M_catarrhalis
family
ChvG — sensor kinase 1 0 0 0
ChvI — response regulator 1 0 0 0
=== Lipid A biosynthesis + Lpt apparatus (false-negative check) ===
C_crescentus A_baumannii N_meningitidis M_catarrhalis
family
LpxA — UDP-GlcNAc acyltransferase 1 1 1 1
LpxC — UDP-GlcNAc deacetylase 1 1 1 1
LpxD — UDP-3-O-N-acyltransferase 1 1 1 1
LpxB — disaccharide synthase 1 1 1 1
LpxK — tetraacyldisaccharide kinase 1 1 1 1
MsbA — LPS flippase 0 1 1 1
LptA — periplasmic LPS carrier 0 1 1 1
LptB — ATPase 1 1 1 1
LptC 1 1 1 1
LptD — OM assembly 1 1 1 1
LptE 1 1 1 1
LptFG (any) 1 0 1 1
# Final scorecard delta vs NB06
print('=== NB06 vs NB06b key findings (post-review) ===\n')
# Sphingolipid arm
print('Sphingolipid arm — was claimed Caulobacter-unique:')
for fam in SPHINGO:
if fam in binary.index:
pat = ''.join('1' if v else '0' for v in binary.loc[fam])
print(f' {fam:55s} NCBI pattern {pat}')
print()
# ChvG-ChvI
print('ChvG-ChvI arm — was claimed alphaproteobacterial-only:')
for fam in CHV:
if fam in binary.index:
pat = ''.join('1' if v else '0' for v in binary.loc[fam])
print(f' {fam:55s} NCBI pattern {pat}')
print()
# False negative check on essential Caulobacter genes
print('Essential lipid A / Lpt apparatus genes — false-negative diagnostic for C. crescentus:')
for fam in LPT:
if fam in binary.index:
cc_present = binary.loc[fam, 'C_crescentus']
if not cc_present:
print(f' ✗ {fam:55s} NCBI also reports ABSENT in C. crescentus')
else:
print(f' ✓ {fam:55s} NCBI reports PRESENT (PaperBLAST had false negative)')
=== NB06 vs NB06b key findings (post-review) === Sphingolipid arm — was claimed Caulobacter-unique: spt — serine palmitoyltransferase NCBI pattern 1000 bcerS — bacterial ceramide synthase NCBI pattern 0000 cerR — ceramide reductase NCBI pattern 1000 sphingosine kinase (sphk) NCBI pattern 0100 ChvG-ChvI arm — was claimed alphaproteobacterial-only: ChvG — sensor kinase NCBI pattern 1000 ChvI — response regulator NCBI pattern 1000 Essential lipid A / Lpt apparatus genes — false-negative diagnostic for C. crescentus: ✓ LpxA — UDP-GlcNAc acyltransferase NCBI reports PRESENT (PaperBLAST had false negative) ✓ LpxC — UDP-GlcNAc deacetylase NCBI reports PRESENT (PaperBLAST had false negative) ✓ LpxD — UDP-3-O-N-acyltransferase NCBI reports PRESENT (PaperBLAST had false negative) ✓ LpxB — disaccharide synthase NCBI reports PRESENT (PaperBLAST had false negative) ✓ LpxK — tetraacyldisaccharide kinase NCBI reports PRESENT (PaperBLAST had false negative) ✗ MsbA — LPS flippase NCBI also reports ABSENT in C. crescentus ✗ LptA — periplasmic LPS carrier NCBI also reports ABSENT in C. crescentus ✓ LptB — ATPase NCBI reports PRESENT (PaperBLAST had false negative) ✓ LptC NCBI reports PRESENT (PaperBLAST had false negative) ✓ LptD — OM assembly NCBI reports PRESENT (PaperBLAST had false negative) ✓ LptE NCBI reports PRESENT (PaperBLAST had false negative) ✓ LptFG (any) NCBI reports PRESENT (PaperBLAST had false negative)