Blog

Blog

A Founder’s Story of Gene Expression Optimization

A Founder’s Story of Gene Expression Optimization

Iddo Weiner

|

May 1, 2025

Like many projects in molecular biology, this one started with a bottleneck.

During my MSc, I was working on a synthetic enzyme whose expression levels were simply too low to measure its downstream effects. The design made sense, the cloning worked, but the protein stubbornly refused to express at usable levels. It was a frustrating experience, one that will feel familiar to anyone who has tried to push a synthetic construct beyond what the host cell seems willing to tolerate.

In trying to increase expression, I was drawn into the broader problem of gene expression optimization. What initially felt like a practical hurdle quickly became a scientific question in its own right: what actually governs how well a gene expresses, and how can we model it? Before long, my focus had shifted. I found myself just as engaged with the mechanisms of transcription, translation, and sequence-level regulation as I was with the protein I had originally set out to study.

Between 2018 and 2020, together with colleagues, we published seven papers focused on modeling gene expression and improving yields of synthetic genes. Some of this work introduced mechanistic and statistical models; other papers included graphical user interfaces that allowed researchers to apply gene expression optimization directly. All of these tools are still used by the community today, with 200+ citations across the seven publications.

Looking back, two principles from that period continue to shape how I think about computational biology.

First, every hypothesis and every model we proposed was tested empirically in the lab. Computational predictions were not treated as answers, but as hypotheses that required experimental validation. This step remains essential. As software tools in biology become more sophisticated, it is increasingly important to ground them in real measurements rather than theoretical performance.

Second, all of our models were explicitly hypothesis-driven. We encoded our biological intuition, our understanding of sequence features, regulatory effects, and molecular mechanisms - directly into the structure of the models. At the time, this approach was not optional. Data was limited, and mechanistic insight was the only way to make progress.

Today, that constraint is rapidly fading.

At Converge Bio, we are revisiting many of the same questions I encountered during my academic work, but with access to generative models and data-driven approaches that were simply not available a few years ago. These methods allow us to move beyond purely hand-crafted hypotheses, while still maintaining the same commitment to empirical validation that guided our early work.

This is the context in which ConvergeGEO came to life.

ConvergeGEO emerged as a natural continuation of years spent thinking about how gene sequences translate into expression outcomes, and how that relationship can be learned, modeled, and optimized. Rather than replacing biological understanding, our goal has been to complement it: using modern AI methods to explore sequence space more effectively, while remaining tightly coupled to experimental reality.

For me, there has been something deeply satisfying about returning to a problem that once felt intractable, armed with better tools but the same scientific rigor.

And for those curious about that original enzyme from my master’s project: using the models we eventually developed, we improved its expression roughly 20-fold. What began as a failed experiment ultimately shaped much of my research, and laid the groundwork for what we are building today.

Read more about ConvergeGEO