Large Biological Models inherently know Cause and Effect
Bold prediction that changes EVERYTHING: Large Biological Models will vastly outperform Large Language Models. Here's why:
Large Biological Models will decipher cause and effect from observational datasets—the holy grail health researchers and clinicians have pursued for decades. LBMs will achieve this breakthrough by training exclusively on time-stamped, location-specific data. All biological measurement data (such as blood tests, genetics, wearable data, etc.) and intervention data (what pills are taken, which procedures are done, which lifestyle actions are taken, etc.) both are timestamped (sometimes by the second and at least with a day) and have a specified location they occurred (the person they were measured on or did the intervention).
Through this simple but revolutionary training improvement, they will inherently contain the fundamental elements humans naturally use to understand cause and effect in the world around us—a capability we develop instinctively from birth and frankly are still the world champions of. Not for long.
Let's break this down. Humans naturally and efficiently grasp cause and effect through observing three key elements in the world around us:
- Temporal proximity: Events happening close in time
- Local proximity: Events occurring side by side
- Observing counterfactuals: Seeing outcomes with and without the specific events occurring
Consider this concrete example that illustrates the importance of these three elements.
Picture this: A baseball bat hitting a ball and the ball changing directions.
- Temporal proximity: The baseball bat hitting the ball happens right before the ball changes directions.
- Local proximity: The baseball bat and ball are touching/right next to each other before the ball changes directions.
- Observing counterfactuals: You have seen hundreds or thousands of balls flying through the air in different situations and whenever they didn't have a baseball bat or some other object next to them they never drastically changed directions. Another key thing to notice here is that we can observe these balls in very different situations and do not need to base our counterfactual observations off of only baseballs in baseball games. Put another way, patterns observed in a different context can many times still be used as counterfactuals (LLMs already show us the power of this).
These simple observations allow you to immediately understand that the bat hitting the ball CAUSED the EFFECT of the ball changing directions. All this without an RCT (Randomized Controlled Trial) no less!
Large Language Models (LLMs and applications built on them like ChatGPT) already perform remarkable feats. We all have been amazed. Yet they accomplish all this with two out of three hands tied behind their backs. They lack time and location data in their training, creating a significant handicap in understanding cause and effect relationships. So is it much of a stretch to imagine how Large Biological Models will be 100x more powerful when direct cause and effect comprehension is unleashed?
The future of AI in general might not lie in processing more text, but in mimicking how humans ourselves learn—through observing events in time and space. Predicting the next word is nice, but knowing cause and effect is nicer. Let’s untie our hands.
As always, I welcome your feedback, additions, disagreements (if backed by logic path or evidence so it leads to learning and discussion).
For more context on open Large Biological Models, check out the first post in this series here: