Multi-Omics, Embeddings, and Molecular Profiles

Why This Lens Exists

Tenant and database boundaries do not match how scientists or agents plan analyses. This data-type lens groups collections by the kind of evidence they provide and the questions they make easier to ask.

Collections In This Lens

nmdc_arkin
kbase_ke_pangenome
kescience_webofmicrobes
enigma_coral

Best Uses

Use this lens to find reusable source data before choosing a specific collection. It is especially useful for agents that need to identify complementary data, select join keys, or explain why a derived product is reusable across projects.

Recent project use: harvard_forest_warming compares DNA, RNA, taxonomy, KEGG functions, and metabolite IDs in one long-term warming experiment. Its main Atlas lesson is that omics-layer interpretation depends on design balance, horizon, processing status, and time scale.

Metrics To Watch

Evidence traces: which projects already used these collections.
Reuse: whether derived products exist beyond a single project.
Under-explored combinations: collection pairs with obvious scientific value but few projects.
Caveat load: schema gaps, identifier mismatches, sampling bias, and staging status.
Omics-layer comparability: whether DNA, RNA, metabolite, proteome, or embedding matrices share the same sample design and confound structure.

Caveats

Do not treat collection co-membership as proof that a join is valid. A join recipe should name stable identifiers, filtering rules, and failure modes before it supports a claim.

Source Projects

Harvard Forest Long-Term Warming — DNA vs RNA Functional Response

Collections

NMDC Multi-omics Nmdc Metadata Nmdc Results Pangenome Collection Kescience Webofmicrobes ENIGMA CORAL

Related Atlas Pages

Review Routes

Chris Mungall