🔬

NMDC Multi-omics

nmdc_arkin

Tenant: NMDC · Snapshot 2026-04-29T21:35:53.838483+00:00

Domain

Schema status

discovered

Curation status

curated

Source

berdl-spark-connect://metrics.berdl.kbase.us:443

Philosophy

Enable integrated microbiome analysis across multiple omics layers. Combine metabolomics, proteomics, and metagenomics with standardized annotations and embeddings for comprehensive sample characterization.

Data sources: NMDC COG KEGG MetaCyc GO

Citation & Attribution

Provider: NMDC

Website: https://microbiomedata.org/

Scale

48
studies
3M+
metabolomics_records
1.4M+
lipidomics_records
60+
tables

Schema Browser

Key Tables

Table Description Rows
annotation_terms_unified Unified annotation terms (COG, EC, GO, KEGG, MetaCyc) 67,353
metabolomics_gold Metabolomics measurements 3,129,061
lipidomics_gold Lipidomics measurements 1,395,867
embeddings_v1 256-dimensional sample embeddings 5,316
trait_features Microbial trait profiles (90+ traits)
study_table Study definitions 48

Sample Queries

Get NMDC studies

SELECT *
FROM nmdc_arkin.study_table

Get metabolomics data

SELECT *
FROM nmdc_arkin.metabolomics_gold
LIMIT 20

Related Collections

Projects Using This Collection

ENIGMA Carbon Census 1

For 83 groundwater- and necromass-derived carbon compounds proposed for community enrichment and isolate phenotyping, wh...

BERDL Data Atlas — Inventory, Topic Map, and Cross-Reference Synergies

What data is available in BERDL (across tenants, agencies, and programs), what biological topics does it cover, and wher...

Harvard Forest Long-Term Warming — DNA vs RNA Functional Response

After ~25 years of +5°C experimental soil warming at the Harvard Forest Barre Woods plot, does the functional transcript...

Gene Function Ecological Agora

Across the prokaryotic tree (GTDB r214; 293,059 genomes / 27,690 species), build a multi-resolution **innovation + acqui...

Plant Microbiome Ecotypes

What is the genomic basis for plant-microbe associations across different plant compartments (rhizosphere, root, phyllos...

Environmental Resistome at Pangenome Scale

Do antimicrobial resistance gene profiles differ between ecological niches across 27,000 bacterial species? Using 83K AM...

Functional Dark Matter — Experimentally Prioritized Novel Genetic Systems

Which genes of unknown function across 48 bacteria have strong fitness phenotypes, and can biogeographic patterns, pathw...

Community Metabolic Ecology via NMDC × Pangenome Integration

Do the GapMind-predicted pathway completeness profiles of community resident taxa predict or correlate with observed met...

Prophage Gene Modules and Terminase-Defined Lineages Across Bacterial Phylogeny and Environmental Gradients

How are prophage gene modules and terminase-defined prophage lineages distributed across bacterial phylogeny and environ...

Polyhydroxybutyrate Granule Formation Pathways: Distribution Across Clades and Environmental Selection

How are polyhydroxybutyrate (PHB) granule-forming pathways distributed across bacterial clades and environments, and doe...

Atlas Pages

conflict

Ecotype labels versus translational leakage

Ecotype labels are reusable stratification products, but translational target lists can collapse when labels and outcomes share leaked or confounded features.

meta

BERDL Data Atlas

Entry point for BERDL tenants, collections, data types, derived products, join recipes, reuse patterns, and missing complementary data.

data type

Multi-Omics, Embeddings, and Molecular Profiles

Metabolomics, proteomics, trait profiles, embeddings, and other matrix-style summaries that create reusable sample or organism representations.

derived product

Ecotype Assignments

Reusable within-species or community ecotype labels that support environmental validation, microbiome stratification, and downstream hypothesis tests.

derived product

Environment Harmonization Labels

Reusable environment category and coordinate-quality labels that make cross-collection ecology joins safer.

data collection

NMDC Multi-omics

Multi-omics analysis data (annotations, embeddings, metabolomics, proteomics, traits)

opportunity

Plant Microbiome Function Validation

Validate whether plant microbiome functional signals persist across ecotype labels, pangenome context, and environmental metadata.

topic

Microbial Ecotypes, Environment, and Field Validation

Synthesis of species-level ecotypes, environmental embeddings, lab-field validation, ENIGMA ecology, and metadata limitations.

topic

Plant Microbiome Function and Agriculture

Synthesis of plant-associated microbial function, beneficial/pathogenic duality, compartment structure, PGP markers, and pangenome ecology.

topic

Metabolic Capability, Dependency, and Community Design

Synthesis of GapMind capability, fitness dependency, metabolic models, community ecology, and design-ready derived data.

Atlas Reuse

Start Exploring

Access the full NMDC Multi-omics data through BERDL JupyterHub.

Open JupyterHub