Blog

Making Open Source Work: How We Built a Practical Internal Tool

Eyal Sol

Feb 12, 2025

How we took Boltz-1, a public open-source model, and built internal tools around it to deliver real value for our bioinformatics teams.

Boltz-1 prediction on a protein small-molecule complex.

Introduction

As an engineer without a biological background, seeing how Boltz-1 works was fascinating — from a raw sequence to an actual 3D representation.
Comparing it to other tools like AlphaFold2 is like night and day:

Super easy to set up and use.
Excellent documentation and a supportive community.

But there’s a gap between running it locally for a few samples and having a production-ready solution — one that scales, requires no setup for the end user, and “just works”.

So, this is how we started: with a great open-source model we wanted to include in our company’s toolbox, keeping a few key considerations in mind:

It had to be simple for the end user.
It had to be cost-efficient (or at least not too expensive).
It had to be hosted on our company cloud infrastructure.

Solution Overview

As mentioned earlier, we wanted an easy-to-use solution that requires no setup from the end user. The idea was to let users “trigger a job and forget about it,” receiving results via email or Slack when the process was complete.

Also, we want our solution to be based on AWS offering, we decided to go with the following AWS services

Inference layer — SageMaker

SageMaker Endpoint
Takes our customized Boltz-1 model container Image, which Includes Inference logic based on AWS requirements.

Orchestration layer — StepFunction & Lambda

StepFunction
A workflow orchestrating the entire flow, from processing user Inputs, Invoking SageMaker Endpoint to sending results once finished
Lambda(s)
The actual part which does the “heavy lifting” — I’ll deep dive Into each lambda responsibility right away

State layer — S3 bucket

S3
Used to store state of each job by a file namedstate.json, this file Is being checked & updated between steps.
Every jobs a user triggers get’s Its own state file
e.g /jobs/<job_name>/state.json

How does the architecture look ?

Personally I find It hard understanding these kind of workflow without seeing a visual representation of it
Here’s a simplified version of the entire workflow.

Let’s go over what happens from the moment a Bioinformaticians wants to generate a structure for a random protein sequence
It goes as follows

User Input:

A bioinformatician uses a helper scripts to invoke a Step Function workflow with the following inputs:

list of key-value pairs (e.g. "<id>”:”<raw_sequence>”).
The user’s Slack email.
A job name (e.g. "please_work").

2. The StepFunction workflow facilitates the entire flow & handling failures

Lambda-1 Is responsible for the following
- It process the raw sequence and save It to a valid state.json file In S3
- Invokes SageMaker async endpoint using .json as Input file
- Create a status.json file which holds the job entire status, process status for each sequence such as S3 Input path (.json file) and the SageMaker endpoint output response and other metadata.
Lambda-2 checks every 5 minutes If there Is a change In the job status
Once all sequences were processed successfully/unsuccessfully
It reads the status of job/each sequence from status.json file and updates It If needed.
Lambda-3 gathers all results, zips them and sends as DM to requester by Slack
It checks which succeeded/failed and format the message accordingly.
Lambda-4 handles all cases of failures within within the StepFunction workflow, notify user via Slack.

A successful result would looks like so
(A screenshot of Slack DM containing results)

Some honorable mentions

I must mentions some Implementation details which made this setup super easy to use, setup and debug

StepFunction workflow
Configuring one place where everything Is “connected” made It much easier to understand the data flow between steps and also debug It when something breaks
Highly recommended !

SageMaker Endpoint
Handles everything from serving the model for Inference, to scaling down when there are more no requests
Would recommend lookin at SageMaker Async Endpoint configuration with autoscaling for scaling up/down based on usage
Highly recommended !

Our Implementation of SageMaker Endpoint Autoscaling.

Using S3 to store state
Instead of using Step Functions’ internal state mechanism, we opted to store a “state” file in S3 for better traceability and auditability.
With S3, it was straightforward to locate the status.json file for each job based on its folder structure (e.g. /jobs/<job_name>/state.json )
This approach allowed us to easily track job statuses and debug issues.
Highly recommended !

jobs/random_id/status.json

Key Takeaways

What do I take from here ?
That open source models are great, but building the right tooling around them Is necessary to make It useable at scale

Research and Updates

Resources Page

Blog

Antibody humanization

September 25, 2024

min read

Blog

Benchmarking AI accelerators and optimization methods

September 23, 2024

min read

Blog

Representing molecular data as "languages"

February 28, 2022

min read

Research and Updates

Blog

Antibody humanization

September 25, 2024

min read

Blog

Benchmarking AI accelerators and optimization methods

September 23, 2024

min read

Resources Page

Research and Updates

Blog

Antibody humanization

September 25, 2024

min read

Blog

Benchmarking AI accelerators and optimization methods

September 23, 2024

min read

Blog

Representing molecular data as "languages"

February 28, 2022

min read

Resources Page