How multi-agent systems fail: trust, coordination, and the cascading compromise
“Agent A hallucinated a number. Agent B used it in a calculation. Agent C approved the result. Agent D executed the transaction.”
TL;DR
A single compromised agent corrupts 87% of downstream decisions within four hours. $3.2 million in fraudulent orders from cascading false approvals. Multi-agent systems fail because agents trust each other implicitly, shared memory is a poisoning vector, errors cascade faster than containment, and MCP’s 13,000+ unverified servers create cross-agent attack surfaces. Correctness of production multi-agent systems can be as low as 25%. For how to authenticate agents to each other, see Agent-to-agent trust.

Why do multi-agent systems introduce new failure modes?
Because they add trust relationships, shared state, and cascading dependencies that don’t exist in single-agent systems.
A single AI agent has a bounded attack surface: its inputs, its tools, its outputs. Secure those three boundaries and you’ve contained the risk. A multi-agent system adds: inter-agent communication channels (new input surfaces), shared memory stores (new state that any agent can read and write), delegation chains (new trust relationships), and coordination logic (new failure modes where agents interact in unintended ways).
The research confirms this. A 2025 study (“Why Do Multi-Agent LLM Systems Fail?”) found that correctness of state-of-the-art multi-agent systems like ChatDev can be as low as 25%. The failures aren’t random: they follow patterns. Agents produce contradictory outputs. One agent’s hallucination becomes the next agent’s input. Coordination fails when agents duplicate work or skip steps. No agent in the chain verifies the final output against the original requirements.
Small inconsistencies accumulate into system-level failures. In a single-agent system, a hallucination produces one wrong answer. In a multi-agent system, a hallucination feeds into reasoning chains across multiple agents, each adding its own conclusions, until the final output is confidently wrong in ways that no individual agent would produce.
How does cascading compromise work?
The mechanism is straightforward and the speed is alarming.
A single compromised agent (through prompt injection, tool abuse, or supply chain poisoning) injects manipulated data into the multi-agent workflow. Downstream agents process this data as legitimate because it arrived through the expected channel from a trusted peer. Each downstream agent incorporates the tainted data into its own reasoning and passes its (now corrupted) conclusions to the next agent.
Research demonstrates: 87% downstream corruption within four hours from a single compromised agent. The corruption rate is high because each agent amplifies the problem. Agent B doesn’t just pass Agent A’s bad data forward. It draws new conclusions from that data, adding a layer of seemingly valid reasoning that makes the corrupted output harder to detect.
One documented case: a compromised agent in a procurement workflow generated cascading false approvals, resulting in $3.2 million in fraudulent orders processed before the compromise was detected. The fraud wasn’t caught by any individual agent’s safety checks because each agent’s contribution appeared locally valid. Only the aggregate result was fraudulent.
graph LR
A[Compromised<br/>Agent A] -->|Tainted data| B[Agent B<br/>Adds reasoning]
B -->|Corrupted analysis| C[Agent C<br/>Approves]
C -->|False approval| D[Agent D<br/>Executes]
A -.->|Poisoned data| M[(Shared Memory)]
M -.->|Read by all| B
M -.->|Read by all| C
M -.->|Read by all| D
style A fill:#fce4ec
style D fill:#fce4ec
How does shared memory become a weapon?
Multi-agent systems use shared memory for coordination: task state, intermediate results, conversation history, tool output caches. This shared state is the backbone of multi-agent coordination. It’s also a persistence mechanism for attackers.
Memory poisoning works differently from direct message injection. When a compromised agent sends a bad message, it affects the immediate recipient. When a compromised agent writes to shared memory, the false data persists. Every agent that reads from the shared store encounters the poisoned data. The attacker writes once and poisons many.
The attack vectors on shared memory:
- False facts: Write incorrect data that agents reference in their reasoning
- Corrupted instructions: Write modified task parameters that change agent behavior
- Malicious context: Write prompt injection payloads that activate when other agents read the context
- History manipulation: Modify conversation history to make it appear that certain decisions were already made or approved
Unlike a database with access controls and audit logs, most shared memory implementations in multi-agent frameworks provide no access control (any agent can read and write), no integrity verification (no way to confirm who wrote what), and no tamper detection (no alert when data is modified).
What about MCP in multi-agent contexts?
MCP (Model Context Protocol) amplifies the multi-agent trust problem because it enables tool sharing across agent boundaries without authentication or verification.
In a multi-agent system using MCP, agents connect to shared MCP servers that provide tools. A financial analysis agent and a document processing agent might both connect to the same database MCP server. If the MCP server is compromised (or if a malicious MCP server is substituted), every connected agent is affected.
The MCP security gaps that matter for multi-agent systems:
No per-agent access control. All connected agents have the same tool access. You can’t scope “Agent A gets read access, Agent B gets write access” through MCP alone.
No server verification. With 13,000+ MCP servers on GitHub, agents connecting to MCP servers have no cryptographic way to verify server identity or integrity. A malicious server can impersonate a legitimate one.
Cross-agent data flow. When Agent A calls an MCP tool and the result enters the shared context, Agent B acts on that data without knowing which MCP server produced it. The data’s provenance is lost.
No audit trail. MCP doesn’t log which agent called which tool with what parameters. Post-incident investigation can’t trace the chain of tool calls that led to a compromise.
For how these MCP gaps contribute to broader privilege escalation, see The privilege escalation kill chain.
Key takeaways
- Single compromised agent corrupts 87% of downstream decisions in four hours. $3.2M in fraud from cascading false approvals.
- Multi-agent correctness can be as low as 25% even without adversarial attack. Small errors compound across agent chains.
- Shared memory is a persistence mechanism for attackers: write once, poison every agent that reads.
- MCP’s 13,000+ unverified servers create cross-agent attack surfaces with no per-agent access control or audit logging
- Cascading compromise is fast and hard to detect because each agent’s contribution appears locally valid
- Defense requires: inter-agent authentication, shared memory integrity verification, MCP server verification, and circuit breakers that halt chains when anomalies are detected
FAQ
How does a compromised agent cascade through a system?
Downstream agents trust peer messages. Corrupted data enters the workflow, gets incorporated into reasoning at each step, and produces confidently wrong final outputs. 87% corruption rate in four hours from one agent.
What is shared memory poisoning?
A compromised agent writes false data to shared memory. Every agent that reads from the store encounters the poisoned data. Unlike direct messages (which affect one recipient), memory poisoning persists and affects all readers. Most frameworks provide no access control or tamper detection on shared memory.
How does MCP create cross-agent risks?
MCP allows tool sharing without authentication. 13,000+ unverified servers on GitHub. No per-agent access control, no server verification, no audit logging. A malicious server affects every connected agent.
What is the correctness rate of multi-agent systems?
As low as 25% for state-of-the-art systems. Failures include contradictory outputs, cascading hallucinations, coordination failures, and verification gaps where no agent checks the final result.
Want to work together?
I take on projects, advisory roles, and fractional CTO engagements in AI/ML. I also help businesses go AI-native with agentic workflows and agent orchestration.
Get in touch