Genes, Proteins, and Annotations
Gene and protein identifiers, functional annotation, literature coverage, orthology, and controlled vocabulary layers used to connect raw genomes to interpretable biology.
Opportunity Hooks
Dark Gene Structure Prioritization
Prioritize dark gene families for mechanistic review by joining fitness, cofitness, annotation novelty, and AlphaFold structure signals.
Low-Confidence Collection Curation
Reduce Atlas caveat load by upgrading high-value low-confidence collection pages with schemas, reuse examples, and missing-data labels.
Genes, Proteins, and Annotations
Why This Lens Exists
Tenant and database boundaries do not match how scientists or agents plan analyses. This data-type lens groups collections by the kind of evidence they provide and the questions they make easier to ask.
Collections In This Lens
kbase_ke_pangenomekbase_genomeskbase_uniprotkbase_uniref50kbase_uniref90kbase_uniref100kbase_ontology_sourcekescience_paperblast
Best Uses
Use this lens to find reusable source data before choosing a specific collection. It is especially useful for agents that need to identify complementary data, select join keys, or explain why a derived product is reusable across projects.
Metrics To Watch
- Evidence traces: which projects already used these collections.
- Reuse: whether derived products exist beyond a single project.
- Under-explored combinations: collection pairs with obvious scientific value but few projects.
- Caveat load: schema gaps, identifier mismatches, sampling bias, and staging status.
Caveats
Do not treat collection co-membership as proof that a join is valid. A join recipe should name stable identifiers, filtering rules, and failure modes before it supports a claim.