Introducing ConvergeCELL

Introducing ConvergeCELL

A unified patient
foundation model.

A unified patient
foundation model.

20M cells

20M
cells

4,479 patients

4,479 patients

350+ diseases

350+ diseases

ConvergeCELL is an end-to-end AI platform for generating therapeutic hypotheses from transcriptomics data. Powered by a patient-level foundation model that bridges single-cell and bulk RNA-seq.

ConvergeCELL is an end-to-end AI platform for generating therapeutic hypotheses from transcriptomics data. Powered by a patient-level foundation model that bridges single-cell and bulk RNA-seq.

20M

Cells in training

From public single-cell atlases

4,479

Donor samples

Across 350+ diseases

SOTA

Performance

Outperforms PaSCient

The scientific
challenge

Gene expression data has become one of the richest sources of insight in biology and medicine. Researchers can now measure the activity of thousands of genes across millions of cells in patient samples. Yet turning that data into therapeutic hypotheses remains fragmented and slow.


The core difficulty is identifying which genes are truly connected to disease: the ones that could serve as biomarkers, explain mechanisms, or become therapeutic targets. Today this relies on a patchwork of differential expression, manual curation, expert interpretation, and custom machine-learning pipelines. Even together, these methods often miss non-linear patterns, struggle with cellular heterogeneity, and rarely transfer cleanly across diseases. The result is weeks of expert curation per disease, and insights that are often elusive.

Gene expression data has become one of the richest sources of insight in biology and medicine. Researchers can now measure the activity of thousands of genes across millions of cells in patient samples. Yet turning that data into therapeutic hypotheses remains fragmented and slow.


The core difficulty is identifying which genes are truly connected to disease: the ones that could serve as biomarkers, explain mechanisms, or become therapeutic targets. Today this relies on a patchwork of differential expression, manual curation, expert interpretation, and custom machine-learning pipelines. Even together, these methods often miss non-linear patterns, struggle with cellular heterogeneity, and rarely transfer cleanly across diseases. The result is weeks of expert curation per disease, and insights that are often elusive.

pixelized colorful circle

ConvergeCELL collapses that workflow into a single platform.

ConvergeCELL collapses that workflow into a single platform.

Figure 1 · The ConvergeCELL architecture, from the bioRxiv preprint.

The Model

ConvergeCELL is a patient-level foundation model trained on 20 million cells across 4,479 donor samples spanning more than 350 diseases.

Rather than learning from cells in isolation, the model represents entire patient samples in a single embedding space. Through supervised contrastive learning organized at the disease-family level — oncological, immune-inflammatory, metabolic-vascular, and others — ConvergeCELL learns shared pathophysiological axes that transfer to held-out conditions. The result is a model that arrives at a new patient cohort with biologically informed priors, not a blank slate.


A knowledge distillation module extends the same representation to bulk RNA-seq, the format most clinical cohorts and retrospective studies actually live in. One unified representation, two transcriptomic data types, validated on both.

NOW OPEN ON HUGGING FACE

The patient representation engine behind ConvergeCELL is available open source, as a resource for the scientific community. The full science is published on bioRxiv.

Three diseases.
Zero-shot.
State of the art.

Three diseases.
Zero-shot.
State of the art.

We validated ConvergeCELL across three independent disease cohorts the model had never seen during training. Each was chosen to stress-test a different dimension of the platform: tissue, modality, and biology. In each case, ConvergeCELL was applied zero-shot, with no fine-tuning, no retraining, and no disease-specific adaptation.

DISEASE 01

TISSUE STRESS-TEST

ZERO-SHOT

Systemic Lupus Erythematosus (SLE)

261 donors · PBMCs · Single-cell RNA-seq

ConvergeCELL classified disease versus healthy at AUROC 0.87, compared to 0.67 for Genentech's PaSCient.


The FDA-approved target of belimumab (TNFSF13B) was ranked 26 out of ~13,000 genes, in the top 0.2% of the gene list. Belimumab is the first monoclonal antibody approved by the FDA for SLE, and ConvergeCELL surfaced its target without any prior knowledge of approved drugs.

DISEASE 01

TISSUE STRESS-TEST

ZERO-SHOT

Systemic Lupus Erythematosus (SLE)

261 donors · PBMCs · Single-cell RNA-seq

ConvergeCELL classified disease versus healthy at AUROC 0.87, compared to 0.67 for Genentech's PaSCient.


The FDA-approved target of belimumab (TNFSF13B) was ranked 26 out of ~13,000 genes, in the top 0.2% of the gene list. Belimumab is the first monoclonal antibody approved by the FDA for SLE, and ConvergeCELL surfaced its target without any prior knowledge of approved drugs.

AUROC

AUROC

0.87

vs. PaSCient · 0.67

Belimumab target rank

26

/ 13,000

TNFSF13B · top 0.2% of genes

DISEASE 02

BIOLOGY STRESS-TEST

ZERO-SHOT

Multiple myeloma

84 samples · Bone marrow · Single-cell RNA-seq

ConvergeCELL distinguished active myeloma from precursor states (MGUS and SMM) at AUROC 0.72, where standard ML collapsed to 0.41 and PaSCient sat at 0.50.


The FDA-approved target of belantamab mafodotin (TNFRSF17 / BCMA) was ranked 3 out of ~11,000 genes. PaSCient ranked the same target at 6,315. This is the harder evaluation: separating MM from its precursor states, where clonal plasma cells blur the disease boundary. Standard methods classify disease-vs-healthy easily; within-disease staging is where they fail.

DISEASE 02

BIOLOGY STRESS-TEST

ZERO-SHOT

Multiple myeloma

84 samples · Bone marrow · Single-cell RNA-seq

ConvergeCELL distinguished active myeloma from precursor states (MGUS and SMM) at AUROC 0.72, where standard ML collapsed to 0.41 and PaSCient sat at 0.50.


The FDA-approved target of belantamab mafodotin (TNFRSF17 / BCMA) was ranked 3 out of ~11,000 genes. PaSCient ranked the same target at 6,315. This is the harder evaluation: separating MM from its precursor states, where clonal plasma cells blur the disease boundary. Standard methods classify disease-vs-healthy easily; within-disease staging is where they fail.

AUROC

AUROC

0.72

vs. PaSCient · 0.50 · Standard ML · 0.41

Belimumab target rank

3

/ 11,000

TNFRSF17 (BCMA) · PaSCient ranked 6,315

DISEASE 03

MODALITY STRESS-TEST

ZERO-SHOT

Sepsis

264 whole-blood samples · Bulk RNA-seq

Applied to bulk data without retraining, ConvergeCELL recovered the three canonical arms of the bacterial sepsis host response: neutrophil activation, type-I interferon signalling, and adaptive humoral immunity.


Six of the model's top 10 attributed genes are individually established sepsis biomarkers, including IFI27 — the basis for a commercially available diagnostic. The sepsis result demonstrates the platform's most practical capability: trained where the science is rich (single-cell), deployed where the data is routine (bulk).

DISEASE 03

MODALITY STRESS-TEST

ZERO-SHOT

Sepsis

264 whole-blood samples · Bulk RNA-seq

Applied to bulk data without retraining, ConvergeCELL recovered the three canonical arms of the bacterial sepsis host response: neutrophil activation, type-I interferon signalling, and adaptive humoral immunity.


Six of the model's top 10 attributed genes are individually established sepsis biomarkers, including IFI27 — the basis for a commercially available diagnostic. The sepsis result demonstrates the platform's most practical capability: trained where the science is rich (single-cell), deployed where the data is routine (bulk).

Top-10 attributed

6/10

Genes are known sepsis biomarkers

Modality transfer

scRNA

→ bulk

Trained where science is rich, deployed where data is routine

Across all three cohorts

ConvergeCELL ranked clinically validated drug targets within the top 0.3% of its rankings — orders of magnitude better than every baseline tested.

ConvergeCELL ranked clinically validated drug targets within the top 0.3% of its rankings — orders of magnitude better than every baseline tested.

From genes
to hypotheses.

261 donors · PBMCs · Single-cell RNA-seq

Identifying disease-associated genes is one part of the workflow — and a readout of what the model has learned. ConvergeCELL also includes a hypothesis generation agent that connects a large language model to biomedical knowledge bases: PubMed, Open Targets, ClinicalTrials.gov.


For each candidate gene, the agent classifies the strength of existing evidence, then generates a structured mechanism-of-action hypothesis covering the gene's pathway context, mechanistic role in disease biology, and potential therapeutic implications.

Direct

Clinical evidence linking gene to disease.

Indirect

Pathway-level or analogous evidence.

No evidence

Novel candidate — flag for in-vitro validation.

The end output is not a gene list. It is a hypothesis card that translational scientists can act on, ready to inform target prioritization, in-vitro experiments, or partnership conversations.

Hypothesis card

SLE · Rank 26

Direct evidencet

Candidate gene

TNFSF13B

TNFSF13B

aka BAFF, BLyS

Pathway context

B-cell survival axis. Encodes BAFF, a cytokine ligand for BAFF-R, TACI, and BCMA. Drives mature B-cell survival, class-switch recombination, and plasma-cell differentiation.

B-cell survival axis. Encodes BAFF, a cytokine ligand for BAFF-R, TACI, and BCMA. Drives mature B-cell survival, class-switch recombination, and plasma-cell differentiation.

Mechanistic role

Overexpressed in SLE PBMCs. Sustains autoreactive B-cell survival, driving production of pathogenic anti-nuclear antibodies. The model attributes signal predominantly to monocytes and DCs — consistent with their role as the primary BAFF producers.

Overexpressed in SLE PBMCs. Sustains autoreactive B-cell survival, driving production of pathogenic anti-nuclear antibodies. The model attributes signal predominantly to monocytes and DCs — consistent with their role as the primary BAFF producers.

Therapeutic implications

Clinically validated by FDA-approved belimumab. Model output suggests responder stratification by baseline serum BAFF and a B-cell-signature index in mild-to-moderate SLE — testable in retrospective bulk RNA-seq cohorts.

Clinically validated by FDA-approved belimumab. Model output suggests responder stratification by baseline serum BAFF and a B-cell-signature index in mild-to-moderate SLE — testable in retrospective bulk RNA-seq cohorts.

Sources

PubMed 34

PubMed

34

Open Targets ★★★

Open Targets ★★★

ClinicalTrials 12

ClinicalTrials 12

A foundation model
for the research community.

A foundation model
for the research community.

By releasing the patient representation engine open source, we are empowering the research community to build on this foundation and accelerate translational research across the field.


For pharma and biotech R&D teams working on transcriptomics-driven discovery: ConvergeCELL is the workflow that today takes weeks of expert curation per disease, automated end-to-end. If you are working in this space, we would like to talk.

By releasing the patient representation engine open source, we are empowering the research community to build on this foundation and accelerate translational research across the field.


For pharma and biotech R&D teams working on transcriptomics-driven discovery: ConvergeCELL is the workflow that today takes weeks of expert curation per disease, automated end-to-end. If you are working in this space, we would like to talk.