Beware the Ides of March
Mark this day, the Ides of March, as a turning point in our communal fight to stay healthy. Thousands of people and trillions of dollars have been spent on trying to win our war against chronic disease. Our loved ones and heroes have first lost the joy in their life and then life itself. My beloved dad James Geer died on this day in 2019. Every year I unhealthily repeat to myself for weeks around this time “Beware the Ides of March”. I don’t like that. I don’t want to do it anymore. I intend, on this forsaken day, to shine a powerful spotlight on an idea that many are already working on that I believe is the turning point we have all been waiting for.
100 years of failure
Unfortunately, most objective measures show we haven’t made all that much of a dent against chronic disease. Sure we pat ourselves on the back for “breakthrough” after “breakthrough”, but we don’t see our loved ones losing life at any slower rate. As most of us know from one article or another, other than our major win against infection with antibiotics nearly a century ago and some admittedly clever surgery techniques over the years to save a small percentage of us from major physical birth or living related defects/damage, we have not lengthened our healthspans much at all in over a century of earnest full on scientific trying.
The turning point
But, and yes there luckily is a but, this is the beauty of turning points. They are not overnight successes. They come from years of innovation and belief and failure and, yes, a decent amount of dumb luck (or is it fate?). It is my strong belief that the massive first domino has finally now fallen with the mainstream success and massive funding now piling into LLMs (Large Language Models - one very popular example ChatGPT is an application built on top of OpenAIs LLM). And what I propose today, on the Ides of March, is this…
I propose we focus and adjust and, in some lovely cases, merely continue many of our efforts towards the building of a new variant of Large Model. The Large Biological Model - aka an LBM. Whereas an LLM is trained on large and varied datasets of words (L for Language), an LBM is trained on large and varied datasets of biological measurement values (B for Biological values).
Coining the term Large Biological Model - LBM
So hopefully at this point, I have piqued some geeky and visceral intellectual curiosity in you. I, also, am fairly certain you now have many more questions than answers. Let me then take this moment to set your expectations correctly. I write you today without all the answers. I also would be remiss if I led you to believe that I invented any of this. There are also other traditional terms that have been used to describe similar things, like “foundation/foundational models”. I am, however, here to coin the term Large Biological Model, and am doing so because I believe it is actually fundamental to the massive leap we are about to take in health. We should follow closely the path of LLMs and therefore naming these models LBMs is actually quite key for that to materialize. Let me explain why following the LLM path, with all its nuanced and evolving genius, is so important.
LLMs went against all conventional wisdom
LLMs actually represent a much larger departure from status quo thinking than most truly appreciate. Also, probably important to add, that most of us, including myself till very recently, didn’t quite understand what LLMs were under the hood.
Side note: I will not attempt to fully explain how LLMs are made and what makes them work here, but I will put a link to an amazing (I do not use that word lightly) explanation video made by Andrej Karpathy in the comments - or search “1hr talk andrej karpathy”. It is well worth actually 40 minutes of your time and I feel hits that magical balance of being useful for both highly technical and “I can just about handle joining a Zoom call” folks.
LLMs at their base are large files of parameters (weightings) of how words are related to each other. They are built from training neural networks on massive amounts of random words on the web pages of the Internet. The important concept here is that they are not trained on specific information or specific websites. They are trained on fairly arbitrary information and websites, but just a lot of it. This is NOT how almost any other AI/ML (Artificial Intelligence/Machin Learning) algorithms/models have been trained in the past. The status quo thinking was always “garbage in → garbage out”. But something magical happened with LLMs.
It’s alive!
The LLM parameter files appeared, at first, to be fairly useless, as the conventional wisdom would assume. Just a very large file of weightings between randomly found words on the Internet. Thanks a lot for that (← sarcasm). However, when the teams then added in some finetuning using some specific words (e.g. some coding language samples, or some “properly” written business emails, etc.), something eerily magical happened. It was so eerie that many thought the thing had come alive. The massive LLM parameter file went from being useless to almost all-knowing. Cutting to the chase, the patterns in those parameters held a near complete understanding of all the meaning in the English language. But this only became visible when directly applied to particular tasks with some small amount finetuned data.
So what should we do?
So hopefully by now you are starting to get an inkling for what I am about to say, but fear not, I will spell it out for you either way. Simply put, we have done language and now we must do biological measurements. Any non-word data that measures in any way the human body. And now we have the path laid out so neatly in front of us. We know now that we don’t need specific and specially curated data. It doesn’t matter. We need massive amounts of diverse biological measurement data no matter where we can find it. We need to start creating these Large Biological Model parameter files and releasing them for everyone in the world to build upon. By training LBM parameter files, we will bring to life a fairly magically complete understanding of human biology. And, importantly, just like LLMs have become a massive accelerating force to the world in just the last 24 months alone, LBMs will likely do exactly the same in the health and science sectors.
Great! You first.
So I am not just coining a term and saying via con dios. We are far from the first, but we are putting our money where our mouth is on this. Humanity (the Longevity company I am co-founder of with Pete Ward) spent some time and resources developing a innovative blood model that allows every blood test taken on earth to now deliver a Biological Age to the patient/customer. We had the honor of having that model peered reviewed and published in a Nature science journal. We, however, did something not very status quo. We didn’t just publish our findings, we actually published all the parameters between all the blood markers measured on 300,000 people in the UK Biobank. Thus was born the seed of Humanity’s first LBM! We are now actively looking for collaborators and already have a few ready to go that will help us expand our training set and make our first LBM more and more robust.
A call to action
So there is so so much more to say. For the sake of this article being readable in one sitting, I will end this one here and write about more and more aspects of this over the coming weeks. Thank you so much for those that have already shared their feedback. Hopefully it has led so far to this being more clear and less rambling.
My call to action is this. I know many of you have been working on parts of this for years (as part of my follow up posts, I want to highlight some of your great work, so please reach out). I know others of you reading this may not feel there is any way for you to directly contribute. I say to both groups of folks and everyone in between, this is all of our work now. Think about how you can contribute a small part of this. That might be intellectually with helping us think through better the nuances of this. That may be by opening up biological datasets that can be trained on to build up the LBM parameter file of either the Humanity LBM or others that will surely be released. That may even be you having experience training LLM parameter files and want to now apply those skills to training an LBM.
There are a myriad of areas here where I personally do not have the experience/skills to anticipate the pitfalls or propose the solutions. Case in point, there are obvious differences between words and biological measurement data, which will no doubt complicate or at least change some of the process of training the base LBM parameter file. What I do know, is that one or several of you dear readers will have those skills and other relevant skills and together we will solve each issue.
Now disease can beware the Ides of March
All our LBM are belong to you. I am done being wary of the Ides of March. I think it’s disease’s turn to worry.
Great read! You've tackled some complex topics with a lot of depth. Just a few thoughts:
1. On progress in the last century, haven’t we made meaningful advancements in interventional cardiology, pharmacology, oncology, and other areas? Acknowledging these could add more balance to your argument about the rate of progress.
2. Regarding LLMs, the data they're trained on isn't random but comes from a structured and validated corpus - the internet. Letters are grouped into words, and words are grouped into sentences, all based on laws of grammar. This contrasts with the challenges LBMs face due to the lack of a comprehensive 'grammar' in biological processes. There is so much we don’t yet understand. You did hint at this at the end, but I think it’s a nuanced and vital point that could be worth exploring in future articles.
3. Your analogy between LLMs and LBMs, centered around accumulating structured biological data, is super intriguing! You do a great job highlighting the potential of LBMs if we can develop a rich and detailed enough training corpus.
I appreciate your insights and looking forward to seeing how your ideas evolve!
Good topic! LLMs showed us the possibility and potential, now it's time to build foundation models for other domains.
I personally believe in world models as a next step from LLMs, and I deeply believe in large biological models which cover/model different levels from cell behavior to the whole organism, maybe then ecosystems (and here it will merge with world models) :)