
Noam Azulay
|
Oct 2, 2025
Among the many difficulties in developing an intelligent biological system especially when its complexities remain hidden is interpreting how or why it arrives at certain results. In classic NLP, if a model says “here’s a summary of your doc” or “roses are red, LLMs are blue; Elon and Sam will control both me and you,” an intelligent English speaker can easily judge whether it’s coherent or just hallucinating. But what happens when your model outputs something like “ACGGTCAG”? Is it nonsense, or a revolutionary biological insight? 🤔
That’s where explainability and interpretability step in. They let us see how a model arrives at its decisions and open its “black box”. One striking example in large language models is the so called “Golden Gate Neuron” discovered in Claude Sonnet by Anthropic. This neuron strongly responds to mentions of the Golden Gate Bridge so much so that amplifying it makes the model repeatedly invoke the bridge, even claiming to be the Golden Gate Bridge. This underscores how manipulating a single internal feature can dramatically shape a model’s behavior.
Recently, Liam Bai and Etowah Adams applied similar mechanistic interpretability methods to protein language models. They used sparse autoencoders (SAEs) on hidden activations, penalizing the use of excessive components while reconstructing internal states. This forces the model to rely on fewer, more interpretable “feature” directions. By examining these features, they uncovered a transmembrane beta-barrel pattern that highlights every other residue along each beta strand, forming a distinctive crisscross arrangement (see image below). Not every feature was so clear, hinting at undiscovered biological pathways or simply reflecting dataset biases. 🤯
Bottom Line:
When your model spits out cryptic sequences be it proteins, DNA, or single-cell data you no longer have to guess whether it’s meaningful. Interpretability gives you a front-row seat to the model’s thought process, unveiling hidden signals and paving the way for new discoveries.
At Converge Bio, we build models across multiple biological modalities and harness our AI and NLP expertise to provide an extra layer of explainability. You get not just predictions but the confidence of knowing why they arise.



