Multi-Omics, Embeddings, and Molecular Profiles
Metabolomics, proteomics, trait profiles, embeddings, and other matrix-style summaries that create reusable sample or organism representations.
Multi-Omics, Embeddings, and Molecular Profiles
Why This Lens Exists
Tenant and database boundaries do not match how scientists or agents plan analyses. This data-type lens groups collections by the kind of evidence they provide and the questions they make easier to ask.
Collections In This Lens
nmdc_arkinkbase_ke_pangenomekescience_webofmicrobesenigma_coral
Best Uses
Use this lens to find reusable source data before choosing a specific collection. It is especially useful for agents that need to identify complementary data, select join keys, or explain why a derived product is reusable across projects.
Recent project use: harvard_forest_warming compares DNA, RNA, taxonomy, KEGG functions, and metabolite IDs in one long-term warming experiment. Its main Atlas lesson is that omics-layer interpretation depends on design balance, horizon, processing status, and time scale.
Metrics To Watch
- Evidence traces: which projects already used these collections.
- Reuse: whether derived products exist beyond a single project.
- Under-explored combinations: collection pairs with obvious scientific value but few projects.
- Caveat load: schema gaps, identifier mismatches, sampling bias, and staging status.
- Omics-layer comparability: whether DNA, RNA, metabolite, proteome, or embedding matrices share the same sample design and confound structure.
Caveats
Do not treat collection co-membership as proof that a join is valid. A join recipe should name stable identifiers, filtering rules, and failure modes before it supports a claim.