LeCun’s $1B bet against LLMs: world models, AMI Labs, and what intelligence actually requires
“The fact that something works doesn’t mean it’s the right path. Horses worked. That didn’t mean we shouldn’t have built cars.” — Yann LeCun
TL;DR
Yann LeCun raised $1.03 billion for AMI Labs — Europe’s largest seed round — to build world models that understand physical causality. His argument: LLMs manipulate symbols without grounding, and scaling will not fix that. The counter-evidence is real — Othello-GPT shows language models develop internal world representations, GPT-4 handles scenarios LeCun said were impossible. For practitioners: both sides are right about different things. The brittleness at the edges of current agents IS the world model gap. But LLMs accomplish extraordinary things within their horizon, and AMI Labs is years from shipping. For the static planning patterns where this gap shows up most, see planning and decomposition.

What is AMI Labs building?
AMI Labs is Yann LeCun’s answer to a question he has been asking publicly for three years: what comes after large language models?
The company raised $1.03 billion in March 2026 at a $3.5 billion pre-money valuation — the largest seed round in European history. Investors include Bezos Expeditions, Eric Schmidt, Mark Cuban, Xavier Niel, and Tim Berners-Lee. The team is drawn from Meta’s FAIR lab: Michael Rabbat as VP of World Models (former FAIR research director), Saining Xie as Chief Science Officer (formerly Google DeepMind and Meta), Pascale Fung as Chief Research and Innovation Officer (HKUST Chair Professor, former FAIR Senior Director), and Laurent Solly as COO (Meta’s former VP for Europe). Alexandre LeBrun, who previously founded medical AI startup Nabla, runs the company as CEO.
The mission: build AI systems that understand physical and social reality through persistent internal world models, enabling safer autonomous agents.
The bet is explicit. LeCun is not arguing that LLMs are bad at what they do. He is arguing that what they do is insufficient for human-level intelligence, and that a different architecture — one that learns causal structure from observation rather than statistical patterns from text — is required.
What is LeCun’s technical argument?
The core claim has three parts.
LLMs lack grounding. Language models learn statistical associations between tokens. They can produce text that reads like an explanation of gravity. They cannot predict where a ball will land. The representation is linguistic, not physical. LeCun’s pointed framing: “a house cat forms better causal understanding than any current LLM.”
Predicting the next token is the wrong objective. Autoregressive training optimizes for surface-level plausibility of the next symbol. It does not optimize for building an internal model of how the world works. A model that predicts “the ball falls” after “you drop the ball” is not simulating gravity — it is completing a pattern it has seen in training data.
Scaling will not fix this. LeCun argues that the limitation is architectural, not one of scale. A transformer trained on more text data will learn more statistical associations but will not spontaneously develop causal reasoning. “Even with a trillion parameters and a quadrillion tokens of training data, a text-only model will not understand that pushing a table moves the book sitting on it” — a prediction he made publicly and that GPT-4 arguably disproved.
His alternative is JEPA (Joint Embedding Predictive Architecture). Instead of predicting the next token in sequence space, JEPA predicts missing parts of the input in abstract embedding space. The system learns what is semantically relevant while ignoring surface variability. I-JEPA does this for images, V-JEPA for video, VL-JEPA for vision-language tasks. The approach resembles how humans develop world models: by observing outcomes and building abstract representations of causality, not by memorizing descriptions.
Where is the counter-evidence?
LeCun’s argument has a credibility problem: his previous predictions about LLM limitations have been consistently wrong.
He predicted LLMs could never understand that pushing a table moves the book on it. GPT-4 handles this. He predicted LLMs could not do multi-step reasoning. The o-series from OpenAI does it routinely. He predicted text-only models hit fundamental ceilings. Each new scaling step pushed the ceiling higher than he expected.
But the stronger counter-evidence is mechanistic, not anecdotal.
Othello-GPT is a language model trained purely on sequences of Othello moves — no board state, no rules, no supervision beyond next-move prediction. Researchers at Harvard found that the model spontaneously learns an internal representation of the board state. Error rates drop from 26.2% with random initialization to 1.7% after training. The representations are linearly decodable — simple vector arithmetic can read and manipulate the model’s internal board understanding. The model learned a world model from pure sequence prediction.
This directly challenges LeCun’s claim that predicting the next token cannot produce world understanding. It did produce world understanding — in a domain where the “world” is simple enough to verify. Whether this transfers to physical reality, where the world is vastly more complex and the training signal is text rather than actions, remains an open question.
The more nuanced view: LLMs trained on multimodal data (images, video, audio, code, text) develop richer internal representations than text-only models. GPT-4, Claude, and Gemini process visual input and demonstrate spatial reasoning that text-only LeCun-era predictions did not anticipate. The question is whether these representations are sufficient for reliable causal reasoning or are sophisticated pattern matching that breaks on novel scenarios.
Where do current agents fail in ways LeCun predicts?
The strongest evidence for LeCun’s position comes from agent failure patterns that are consistent with a missing world model.
Physical reasoning. Agents hallucinate solutions to physical tasks — proposing “click submit” without verifying form completion, or generating plans that violate spatial constraints they should understand. These failures are consistent with a system that lacks a persistent representation of the current state of the world.
State tracking across long tasks. When an agent hands off context between steps (or between sub-agents), critical state information is lost. The next step operates from a partial snapshot. This is exactly what a persistent world model would prevent — maintaining continuous state representation regardless of context window boundaries.
Causal confusions. Agents frequently confuse correlation with causation in their reasoning. They observe that A happened before B and conclude A caused B. Without an internal model of mechanisms, every temporal sequence looks potentially causal.
A taxonomy of multi-agent failures (MAST) identified 14 unique failure modes across system design, inter-agent misalignment, and task verification gaps. Many of these — coordination breakdowns, conflicting assumptions, state inconsistencies — would be mitigated by a shared world model that all agents can query.
These are the failure modes that LeCun points to. They are real. The question is whether world models are the only solution or whether explicit engineering (state management, verification loops, tool grounding) can compensate. Current production agents use the engineering approach. AMI Labs is betting on the architectural one.
graph TD
subgraph "LeCun's Argument"
A[LLMs predict tokens] --> B[No causal model]
B --> C[Fails on novel physics]
B --> D[Loses state across tasks]
B --> E[Cannot plan in physical world]
end
subgraph "Counter-Evidence"
F[Othello-GPT learns board state] --> G[Next-token prediction CAN<br/>produce world representations]
H[GPT-4 handles physical<br/>scenarios LeCun predicted<br/>would fail] --> I[Scaling keeps<br/>surprising]
end
subgraph "Practitioner Reality"
J[Current agents DO fail<br/>at edges] --> K[Engineer around limits:<br/>state mgmt, verification,<br/>tool grounding]
L[AMI Labs years from<br/>shipping] --> K
end
What does this mean for building agents in 2026?
LeCun is right about the limitation and wrong about the timeline — and for practitioners, the timeline matters more than the philosophy.
AMI Labs has $1.03 billion and a team of world-class researchers. They do not have a model that outperforms GPT-4 on any practical benchmark. JEPA exists as a research framework, not a product. The earliest you might see AMI Labs technology integrated into a commercial agent is 2028, and that is optimistic.
Meanwhile, LLM-based agents are being deployed at scale. The brittleness at the edges is real but manageable. The engineering mitigations work:
Explicit state management. Instead of relying on the LLM’s internal representation of state, maintain an external state store that the agent queries and updates. State management and checkpoints covers the patterns.
Tool-grounded reasoning. Rather than asking the LLM to reason about physics, give it tools that model physics. A calculator for math. A database for state queries. A physics engine for spatial reasoning. The LLM orchestrates; the tools provide ground truth.
Verification loops. Self-reflection and critique patterns catch many of the failures that a world model would prevent. Not all, but enough for production viability.
The honest assessment: LLMs with good engineering are 90% solutions. The missing 10% is the world model gap. For most production use cases, 90% is sufficient. For autonomous agents in safety-critical physical domains — robotics, autonomous driving, medical procedures — it is not. That is where AMI Labs’ work matters most, and where LeCun’s $1.03 billion has the strongest thesis.
Key takeaways
- AMI Labs raised $1.03B — Europe’s largest seed — to build world models that understand causality. Team from Meta FAIR, Paris-based, shipping timeline unclear.
- LeCun’s argument is technically specific. LLMs lack persistent world models. The limitation is architectural, not one of scale. JEPA is the proposed alternative.
- Othello-GPT challenges the thesis. Next-token prediction does produce internal world representations — at least in simple domains. Whether this scales to physical reality is open.
- Current agent failures are consistent with both views. Physical reasoning failures and state tracking bugs look like a world model gap. They also look like engineering problems with engineering solutions.
- For 2026 practitioners: engineer around it. Explicit state management, tool grounding, and verification loops compensate for the missing world model. AMI Labs’ work is intellectually important and years from practical impact.
Further reading
- World models for agents — the technical foundations of agent world models
- State management and checkpoints — the engineering approach to persistent agent state
- Self-reflection and critique — verification patterns that compensate for missing world models
Want to work together?
I take on projects, advisory roles, and fractional CTO engagements in AI/ML. I also help businesses go AI-native with agentic workflows and agent orchestration.
Get in touch