Blog

Blog

Are binding antibody-antigen affinity values in public datasets reproducible?

Are binding antibody-antigen affinity values in public datasets reproducible?

Iddo Weiner

|

Aug 19, 2025

We ran a small validation assay recently, and the results were unexpectedly concerning.

Background

Last week, we shared insights on the true positive and negative rates in public antibody databases (link to post in comments). These rates, while informative, are far from perfect. This prompted a shift in thinking: instead of training binary classification models (binding vs. no-binding), why not train regression models to predict continuous binding affinity values (e.g., KD)? The rationale is that affinity values provide a more nuanced signal, allowing models to distinguish between no binding and weak binding. However, this approach is limited by the relative scarcity of publicly available affinity measurements. That led us to ask a foundational question:
How reproducible are the binding affinity values themselves?

The Experiment

We selected 10 antibodies from a widely used public database. All came from the same patent and had KD values measured using SPR.
We replicated the setup precisely; same antibodies, same antigen, same SPR set up.

The Results

While we didn’t expect perfect alignment between the two sets of values, we were surprised to find no correlation at all.
Neither the absolute KD values nor their relative rankings showed any agreement between our measurements and the public data. Additionally, we observed the reported database KD values were consistently much lower than what we observed.
On the bright side, all antibodies exhibited clear 1:1 Langmuir behavior, confirming that they are indeed true binders.

Implications for Model Building

To be clear: this is a small sample - just 10 antibodies, from one patent, against a single antigen. It’s too early to generalize.
And of course, the biological sciences are no strangers to noise and variability. Perfect reproducibility is a high bar.
Still, the complete absence of correlation, even in trend, is troubling.
From a modeling perspective, this reinforces the importance of robust strategies for handling noisy or erroneous data. Without them, even the best architectures may struggle to learn from or generalize across inconsistent affinity measurements.

This experience serves as a reminder: as we move toward increasingly sophisticated models, the quality and consistency of underlying data will be just as critical as model architecture. It might be time for the field to revisit assumptions about what we treat as "ground truth."