From Word Models to World Models - Part 1
How the Same Revolution Behind ChatGPT Is Leading to the Matrix (without the whole human battery thing)
Act I - The Hidden Patterns
Imagine the world not as chaos, but as interwoven patterns. An intricate, unfolding dance of matter, energy, and spacetime. From atoms to actions, there are patterns underneath it all. Language, nature, human biology: none of them are random, and all of them are made up of simple patterns. They play out in front of us, if only we knew how to see them.
For decades, we groped around these patterns with intuition and fragmented knowledge. Then, we gave machines the task of learning the patterns for us.
And they started seeing what we couldn't.
Act II - The Transformers: Knobs That Tune Themselves
In 2017, a quiet revolution arrived in the form of a single world changing paper - Attention is All You Need. The Transformer architecture handed down to humans (eerily almost fully in tact) in that paper changed everything.
What is a transformer and what does it do? Imagine a vast control board with billions of tiny knobs. You feed it raw text - oceans of it - and the knobs self-adjust, no human required. No grammar rules. No dictionaries. Just the strings of words flowing through the knobs adjusting their settings.
What emerged from this process and transformer architecture was startling: the system didn’t just mimic language, it understood it and the world the language described. At least well enough to write essays, answer questions, even joke.
The lesson? If you show a transformer enough of a world; words, images, numbers, it can internalize its structure. The world is modeled by all the specific knob adjustments inside that transformer.
Act III - From Words to Worlds
Words were just the gateway drug. Feed it images, and it learns to see. Feed it audio, and it learns to listen. Feed it tabular data from medicine, finance, physics - and it starts to reason in those domains too.
Each of these models becomes a kind of model of the world - a world model - an engine trained not just to recite facts or map out logic, but to simulate fully how that world works. Like compressed blueprints of entire domains, captured inside a transformer’s knob settings (attention weights).
You can download these world snapshots already today. Increasingly Huggingface won’t just host models. It will host miniature realities.
Act IV - The Eternal Question About to be Answered
We humans have always asked the same question:
"If I do this… what will happen?"
That question is the engine of intelligence. And until now, we mostly answered it by guessing, debating, or experimentation.
But world models give us a new and final method: simulate it. Feed in the situation you have questions about, and the model plays the world forward using learned patterns and emergent cause and effect understanding. Like a flight simulator, but for all reality.




What a fascinating read Mike-thank you for sharing this. Reading this, I can’t help but think of collective consciousness itself as a kind of world model: fluid, emergent and never static. When enough of us hold an issue in mind together, reality tilts- sometimes subtly, sometimes tectonically. Perhaps models are less machines and more mirrors, reflecting how we collectively imagine the future.
The contrarian in me wonders: if a model can simulate reality, and reality itself is already shaped by perception and consciousness, then what exactly are we simulating? Perhaps intelligence is less about predicting the future and more about remembering which dreams we are living.
If world models let us explore infinite possibilities, the deeper experiment could be how many realities can we live into before we recognise which one is ours? I am looking forward to reading Part 2:)