Blog

Converge Team
|
May 27, 2026
We recently announced a $2.5 million grant from the Gates Foundation to build something the field of agricultural genomics has not yet had: a multimodal, long-context foundation model purpose-built for crops. The goal is to compress breeding cycles for climate-resilient varieties, and to do it by changing how causal genes are discovered in the first place.
The stakes are concrete. Climate adaptation is putting pressure on staple crops faster than traditional breeding can respond. A new wheat or sorghum variety can take a decade to develop. The world does not have a decade for every trait that matters.
Why crop genomes break conventional approaches
Crop genomes are enormous and profoundly understudied. Function in these systems emerges from interactions spanning millions of base pairs: long-range regulatory elements, chromatin architecture, co-expression networks, epistatic effects. Yet most computational approaches still analyze these systems in fragments, looking at SNPs, genes, or regions in isolation and patching the results together after the fact.
Complex agronomic traits like heat resilience or yield are almost never the work of a single gene. They emerge from coordination across megabases. Asking "is this SNP statistically significant?" is the wrong shape of question. The right one is: given a genomic region and its expression profile, which variant is most likely causally responsible for the phenotype we observe?
That question requires a model that can hold the full context in view at once.
Pretrain broadly. Fine-tune precisely.
Our approach mirrors the paradigm that transformed natural language and protein modeling, now adapted to the scale and structure of plant genomes.
We are assembling large crop genomic and transcriptomic datasets and harmonizing them into a unified training corpus. During pretraining, the objective is not prediction but representation. Through self-supervised objectives adapted to nucleotide sequences and expression profiles, the model learns the statistical and structural regularities of plant genomes before being asked any task-specific question.
Only after pretraining do we fine-tune for the work that matters to breeders: causal SNP prioritization within QTL windows, multimodal genotype-phenotype mapping, and candidate gene ranking. We are evaluating and extending state-of-the-art long-context DNA architectures so the model can reason simultaneously across entire QTL regions, regulatory sequences, gene neighborhoods, and matched transcriptomic signals.
Interpretability as a design constraint, not an afterthought
In biology, prediction without interpretability is not enough. A model that ranks a gene without explaining why is a black box that breeders cannot act on and biologists cannot test.
We treat explainability as a first-class design constraint. Attention mechanisms and embedding-level analyses let us trace predictions back to specific sequence segments, surface the regulatory regions that drove a decision, and expose the expression features that influenced prioritization. The result is a model that functions as a scientific collaborator: breeders can understand why a gene was prioritized, biologists can generate testable mechanistic hypotheses, and experimental teams can design CRISPR edits grounded in model-derived insights.
What success looks like
The target outcome is measurable. The fine-tuned model will accept a QTL region paired with a matched transcriptomic profile, rank candidate causal SNPs and genes, and place the true causal variant among the top-ranked candidates with high consistency across benchmarks.
This is a translational shift: from correlation-based mapping toward AI-driven causal prioritization operating at the genomic scale. When it works, breeding cycles compress. Climate-resilient varieties get to farmers faster. The decade becomes a few years.

The partnership
The $2.5 million grant from the Gates Foundation funds the foundational science underneath this effort: data assembly and harmonization, pretraining at scale, fine-tuning for causal prioritization, and the interpretability work that makes the model useful to breeders and biologists in the field. Climate adaptation in agriculture is exactly the kind of problem foundation models should be aimed at, and exactly the kind of problem that does not get solved without aligned partners willing to fund the long, foundational science underneath. We are grateful to the Gates Foundation for backing it. More to come.


