OrgAgent: what happens when you organize multi-agent systems like a company
“A company with three people who know their roles outperforms a crowd of fifty who don’t.”
TL;DR
OrgAgent (arXiv 2604.01020) structures multi-agent systems with governance, execution, and compliance layers — like a functional company, not a flat swarm. On SQuAD 2.0, this achieves 102.73% performance improvement and 74.52% token reduction versus unstructured multi-agent baselines. The token savings alone make this worth studying. For the runtime governance infrastructure that manages agent cost and permissions, see Paperclip.

Why do flat multi-agent systems waste tokens?
Flat multi-agent systems — where every agent has equal authority and communicates freely — have a coordination problem. Agents duplicate work, contradict each other, and spend tokens debating rather than solving. The MAST research taxonomy (UC Berkeley, 1,642 annotated traces across 7 MAS frameworks) identified 14 unique failure modes across three categories: system design issues (5 modes, 44.2% of failures), inter-agent misalignment (6 modes, 32.3%), and task verification gaps (3 modes, 23.5%). Step repetition (15.7%) and reasoning-action mismatch (13.2%) are the most common individual failure modes.
Gartner measured 1,445% growth in multi-agent system inquiries from Q1 2024 to Q2 2025. Most teams building their first multi-agent system default to flat architecture because it maps to how they think about collaboration: everyone talks to everyone. In production, this creates the “bag of agents” anti-pattern where unstructured parallelism amplifies errors instead of reducing them.
The insight behind OrgAgent is that coordination problems in multi-agent systems mirror coordination problems in human organizations. And human organizations solved this centuries ago: hierarchy, specialization, and separation of concerns.
graph TD
subgraph "Flat multi-agent (default)"
A1[Agent A] <--> A2[Agent B]
A2 <--> A3[Agent C]
A1 <--> A3
A1 <--> A4[Agent D]
A2 <--> A4
A3 <--> A4
end
subgraph "OrgAgent (structured)"
G[Governance layer<br/>Strategic oversight] --> E1[Execution Agent 1]
G --> E2[Execution Agent 2]
E1 --> C[Compliance layer<br/>Quality verification]
E2 --> C
C --> G
end
What are the three layers and what does each do?
OrgAgent decomposes multi-agent reasoning into three distinct layers, each with different authority, responsibility, and communication patterns.
Governance layer. This is the strategic brain. It receives the task, decomposes it into sub-tasks, allocates resources, and makes decisions about which execution agents handle which parts. In a company analogy, this is the executive team deciding what to build and who builds it. The governance layer does not execute work directly. It plans and delegates.
Execution layer. Agents in this layer do the actual work — answering questions, generating content, running computations. Each execution agent has a focused scope defined by the governance layer. They do not communicate with each other directly. All coordination flows through governance. This eliminates the crosstalk that wastes tokens in flat architectures.
Compliance layer. After execution agents produce outputs, the compliance layer verifies quality. Does the output meet the task requirements? Are the facts correct? Is the response complete? Failed compliance checks route back to governance for re-assignment or correction. This is the quality gate that flat systems lack.
| Layer | Responsibility | Authority | Communication |
|---|---|---|---|
| Governance | Task decomposition, resource allocation | High — sets strategy | Bidirectional with compliance, downward to execution |
| Execution | Task completion, focused work | Low — follows assignments | Upward to compliance only |
| Compliance | Quality verification, error detection | Medium — can reject and escalate | Upward to governance, receives from execution |
How does this produce 102% performance improvement?
On SQuAD 2.0 using GPT-OSS-120B, OrgAgent achieved 102.73% performance improvement over flat multi-agent collaboration. The improvement comes from two sources.
Eliminated coordination waste. In flat systems, agents spend tokens negotiating who does what, reconciling contradictory outputs, and repeating work already done by another agent. OrgAgent eliminates this by making governance responsible for task allocation. Execution agents never talk to each other. They receive assignments and produce outputs. The 74.52% token reduction reflects how much of flat multi-agent token usage goes to coordination overhead rather than actual work.
Structured verification. Flat systems often lack a dedicated quality check. One agent produces an answer, another might disagree, and the system averages or votes. OrgAgent’s compliance layer applies focused verification, catching errors that execution agents miss. This is the difference between peer review (inconsistent, often skipped) and a dedicated QA process (systematic, always applied).
The 102.73% improvement means OrgAgent’s structured approach produces answers that are more than twice as good as the flat baseline on the same task with the same underlying model. The 74.52% token reduction means it does this while spending roughly one quarter the tokens.
When should you use OrgAgent versus a single agent?
The answer depends on whether your task actually needs multiple agents. Research (arXiv 2604.02460) shows that single-agent systems match or outperform multi-agent systems on sequential reasoning when token budgets are equalized.
OrgAgent earns its complexity in two scenarios.
Tasks with natural authority separation. If your pipeline has planning, execution, and verification as genuinely distinct operations — not just different prompts for the same model — the three-layer structure prevents the planning agent from self-grading its own work. The separation of governance from compliance is the same principle as separating the developer who writes the code from the reviewer who approves it.
Tasks where verification catches real errors. If execution agents produce outputs that need checking — factual claims, calculations, code that must compile — the compliance layer provides a systematic quality gate. Without it, errors pass through uncaught. With it, the governance layer can re-route failed tasks before they reach the user. For multi-agent architectures where verification is less important, simpler designs like hub-and-spoke orchestration may be sufficient.
Key takeaways
- Structure beats headcount. OrgAgent’s three layers outperform flat multi-agent by 102% while using 75% fewer tokens.
- Governance, execution, compliance. Three layers with distinct authority and communication patterns mirror successful human organizations.
- Execution agents never talk to each other. All coordination flows through governance. This eliminates the crosstalk that wastes tokens in flat architectures.
- Compliance is the missing layer. Most multi-agent systems lack dedicated verification. Adding it catches errors that self-grading misses.
- Not always necessary. Single agents win on sequential reasoning. OrgAgent earns its complexity when tasks need authority separation and systematic verification.
Further reading
- Paperclip: the org chart your AI agents are missing — runtime governance infrastructure for multi-agent cost and permission control
- Multi-agent architectures — coordination patterns including hub-and-spoke
- Scaling multi-agent systems — production considerations for multi-agent deployments
Want to work together?
I take on projects, advisory roles, and fractional CTO engagements in AI/ML. I also help businesses go AI-native with agentic workflows and agent orchestration.
Get in touch