7 Comments
User's avatar
Alex Merwin's avatar

Great read! You've tackled some complex topics with a lot of depth. Just a few thoughts:

1. On progress in the last century, haven’t we made meaningful advancements in interventional cardiology, pharmacology, oncology, and other areas? Acknowledging these could add more balance to your argument about the rate of progress.

2. Regarding LLMs, the data they're trained on isn't random but comes from a structured and validated corpus - the internet. Letters are grouped into words, and words are grouped into sentences, all based on laws of grammar. This contrasts with the challenges LBMs face due to the lack of a comprehensive 'grammar' in biological processes. There is so much we don’t yet understand. You did hint at this at the end, but I think it’s a nuanced and vital point that could be worth exploring in future articles.

3. Your analogy between LLMs and LBMs, centered around accumulating structured biological data, is super intriguing! You do a great job highlighting the potential of LBMs if we can develop a rich and detailed enough training corpus.

I appreciate your insights and looking forward to seeing how your ideas evolve!

Expand full comment
Michael Geer's avatar

Alex, great feedback! Super useful. I'll give some quick replies now, but we should dig in deeper and maybe podcast this soon.

1. I definitely can agree my statement on progress is overly binary (a bit for effect to be honest). However, I would push back on rate of progress on detection and treatment of cancer, as one major example. I think our bar has been set way too low on that. Important for me to emphasize here that there is a difference between lack of enough progress (my opinion) and questioning people's immense efforts and good intentions (something I am certainly not saying).

2. Agreed the differences between language and biological data is a great next post topic. I would, however, already possibly boldly make the prediction that we will find that biological data does indeed have its own grammar and structure (think signaling pathways, feedback loops, etc - love that way of looking at it btw!)

3. Thank you for the kind words! I think important at this point that we start small and build, as we did with LLMs. More and more LLM focused techniques for getting more power with less training data (coming out every few weeks now) will also accelerate things in LBMs as we go.

Expand full comment
Grigory Sapunov's avatar

Good topic! LLMs showed us the possibility and potential, now it's time to build foundation models for other domains.

I personally believe in world models as a next step from LLMs, and I deeply believe in large biological models which cover/model different levels from cell behavior to the whole organism, maybe then ecosystems (and here it will merge with world models) :)

Expand full comment
Michael Geer's avatar

Indeed! Agreed. My main additional thought to your comment is that I think we will see larger/quicker gains towards our goal of making humans healthier if we focus more attention on the whole organism and measurements on the whole organism side and effects of interventions on the whole organism from now and we can work our way down to cell and intra-cell measurements and effects, as we go.

My analogy for this reallocation of current resources is that we were able to build skyscrapers with little or no understanding of quantum physics and relied on our understanding of statics and particle dynamics and so feeling we need to know the minutia of intra and inter cell interactions (where the lions share of funding and talent is at the moment in AI health modeling) to understand how to make people healthier most likely is incorrect.

Expand full comment
Nina Patrick's avatar

Great article Michael.

What the internet did for LLMs is incredible - being able to be trained in endless internet content (User generated, public, never ending). How can we achieve that with biological data? What we do have for biological data is held by companies, health insurances, doctors, etc etc. it’s scattered, fragmented and not standardized. Plus is sensitive data, requiring anonymization. Biological data lacks context sometimes too, or data is missing. What’s the first step to unlocking all these existing vaults of data and making it useable?

Expand full comment
Natalia Simonenko's avatar

powerful idea and great mission, as always, Michael!

Expand full comment
Saasha Celestial-One's avatar

i check into humanity every day, still trying to crack the aging calculation on my own! ;-)

Expand full comment