Blog

Blog

Assessing the Reproducibility of Antibody Developability Data

Assessing the Reproducibility of Antibody Developability Data

Iddo Weiner

|

Aug 11, 2025

Are antibody developability values in public datasets reproducible?
We ran a few validation assays recently, and the results were encouraging.

Background

A few weeks ago, we shared observations about antibody-antigen binding reproducibility from public antibody databases (link to previous posts in the comments). This time, we turned our focus to developability traits.

Developability is arguably one of the most overlooked aspects in AI-driven antibody design. Model builders often focus on generating new binders, while seasoned drug developers are quick to ask, "But what’s the developability profile?" If we optimize only for binding, we risk producing molecules that are unusable downstream. To deliver real value, the antibodies we design must also be manufacturable and stable.

This brought us to a fundamental question:
How reproducible are solubility and thermal stability values in public datasets? Can we treat them as reliable ground truth for model training?

The Experiment

One of the biggest challenges here was finding labeled data. Our aim was to sample representative, not just prominent, antibodies from public datasets. However, for developability, data scarcity makes this very difficult.

Thus, we ended up selecting known antibodies with developability metrics from Jain et al., 2017, a well-regarded paper in the field published in PNAS, which measures developability profiles for 137 antibodies. From this set, we sampled 12 antibodies and ran two assays:
*HIC - for solubility
*DSF - for thermal stability

The Results

After seeing poor reproducibility for binding affinity measurements last week, we were pleased to find that both solubility and thermal stability were highly reproducible across the selected antibodies.
The absolute values varied slightly, as expected, but the correlations were extremely high. Notably, HIC-based solubility rankings were *perfectly* consistent (extremely rare!)

Implications for Model Building

These results were surprising in the best way. In hindsight, there are several factors worth noting:
*Since we worked with well-characterized antibodies, lot-to-lot variation was likely lower, which may have contributed to the reproducibility.
*Unlike antigen-antibody binding data, developability data is single-modality, it depends only on the antibody (and not on the antigen). This might reduce noise.
*Our source was a peer-reviewed, highly cited publication, which likely contributed to data reliability.

Despite these limitations, this pilot suggests a few hypotheses for model builders:
*Publicly available developability data, while scarce, may be more reproducible than expected.
*Data sourced from peer-reviewed literature may be more trustworthy than other public sources.
*Known antibodies likely offer more reliable benchmarks than de novo sequences with limited experimental history.

Finally, data quality matters. Before defining your “ground truth,” be sure it’s grounded. Better yet, validate it.

Learn more about our antibody design and screening solution