Blog

Blog

Benchmarking AI accelerators and optimization methods

Benchmarking AI accelerators and optimization methods

Benchmarking AI accelerators and optimization methods

Sep 23, 2024

|

5

min read

One of the things we are working on at Converge Bio, besides benchmarking the best models for different types of biological data and building new models where there is no good alternative, is to benchmark AI accelerator chips FLOPS per $.
Achieving cost-effective computational performance is a long term goal for our company and we see it as a crucial aspect that will allow us to lead in our field.

As a bonus, this approach allows us to practice "First principles thinking" and also explore and test various machine learning paradigms. During our evaluation of Intel Corporation’s Gaudi 2, we’ve investigated training ESM-2 with classification layers and the differences between training only the classification layers versus training the entire model on a proprietary data set of protein-protein interactions, these experiments have unveiled some interesting insights:
1. As expected the bigger models give better results, but take longer to train and take up more memory, which can force training on expensive accelerators with allot of memory, this means that the cost to train bigger models does not scale linearly (luckily on a machine with 8 x Gaudi2 accelerators each with 100GB memory and 1TB of RAM this is not a problem).
2. A more surprising result came from comparing the training of the full model vs the classification layers only, the results are staggering and training the classification layers only - does not come close to training the entire model (see the attached image from Weights & Biases - all the top results are full training) so although it demands more computational resources, this approach delivers significantly better results, offering a more favorable balance between cost and performance.

Of course, there’s a middle ground. Techniques for PEFT (Parameter-Efficient Fine-Tuning), such as LoRA, can provide a more balanced approach by enhancing performance without substantially increasing computational costs (we'll be doing more benchmarking and we'll share the results).
We're now working with Amazon Web Services (AWS) SageMaker to test NVIDIA GPUs and AWS Trainium chips. Stay tuned for more insights as we continue to explore the most efficient paths to maximize FLOPS per dollar across different hardware platforms.




Read this post on LinkedIn >

Subscribe To Our Newsletter

Subscribe To Our Newsletter

Subscribe To Our Newsletter

Get the latest tech insights delivered directly to your inbox!

Subscribe To Our Newsletter
Subscribe To Our Newsletter
Subscribe To Our Newsletter