What are the three layers in OrgAgent?

OrgAgent decomposes multi-agent systems into governance (strategic oversight and task allocation), execution (agents performing the actual work), and compliance (verification that outputs meet quality standards). Each layer operates with different authority levels and communication patterns, mirroring how functional organizations separate decision-making from execution from quality control.

How much does OrgAgent improve over flat multi-agent systems?

On SQuAD 2.0 with GPT-OSS-120B, OrgAgent achieved 102.73% performance improvement and 74.52% token reduction compared to flat multi-agent collaboration. The token reduction is the more production-relevant number — it means OrgAgent produces better results while spending roughly one quarter the tokens of an unstructured multi-agent system.

How is OrgAgent different from Paperclip's agent governance?

Paperclip provides runtime governance infrastructure — budget caps, approval workflows, cost controls — for any multi-agent system. OrgAgent provides an architectural pattern — how to structure the agents themselves into governance, execution, and compliance layers. They solve different problems. You could run OrgAgent's architecture on Paperclip's infrastructure.

Should I use OrgAgent or a single agent?

If your task is sequential reasoning with no genuine parallelism, a single agent with full token budget likely wins (arXiv 2604.02460). OrgAgent earns its complexity when tasks genuinely decompose into sub-tasks that benefit from different authority levels — when some agents need to plan, others need to execute, and others need to verify. The governance-execution-compliance split mirrors real engineering organizations for a reason.

OrgAgent: what happens when you organize multi-agent systems like a company

6 minute read

“A company with three people who know their roles outperforms a crowd of fifty who don’t.”

TL;DR

OrgAgent (arXiv 2604.01020) structures multi-agent systems with governance, execution, and compliance layers — like a functional company, not a flat swarm. On SQuAD 2.0, this achieves 102.73% performance improvement and 74.52% token reduction versus unstructured multi-agent baselines. The token savings alone make this worth studying. For the runtime governance infrastructure that manages agent cost and permissions, see Paperclip.

A cutaway architectural model of a three-level building with blue, amber, and green lighting on each floor, representing governance, execution, and compliance layers

Why do flat multi-agent systems waste tokens?

Flat multi-agent systems — where every agent has equal authority and communicates freely — have a coordination problem. Agents duplicate work, contradict each other, and spend tokens debating rather than solving. The MAST research taxonomy (UC Berkeley, 1,642 annotated traces across 7 MAS frameworks) identified 14 unique failure modes across three categories: system design issues (5 modes, 44.2% of failures), inter-agent misalignment (6 modes, 32.3%), and task verification gaps (3 modes, 23.5%). Step repetition (15.7%) and reasoning-action mismatch (13.2%) are the most common individual failure modes.

Gartner measured 1,445% growth in multi-agent system inquiries from Q1 2024 to Q2 2025. Most teams building their first multi-agent system default to flat architecture because it maps to how they think about collaboration: everyone talks to everyone. In production, this creates the “bag of agents” anti-pattern where unstructured parallelism amplifies errors instead of reducing them.

The insight behind OrgAgent is that coordination problems in multi-agent systems mirror coordination problems in human organizations. And human organizations solved this centuries ago: hierarchy, specialization, and separation of concerns.

graph TD
    subgraph "Flat multi-agent (default)"
        A1[Agent A] <--> A2[Agent B]
        A2 <--> A3[Agent C]
        A1 <--> A3
        A1 <--> A4[Agent D]
        A2 <--> A4
        A3 <--> A4
    end

    subgraph "OrgAgent (structured)"
        G[Governance layer<br/>Strategic oversight] --> E1[Execution Agent 1]
        G --> E2[Execution Agent 2]
        E1 --> C[Compliance layer<br/>Quality verification]
        E2 --> C
        C --> G
    end

What are the three layers and what does each do?

OrgAgent decomposes multi-agent reasoning into three distinct layers, each with different authority, responsibility, and communication patterns.

Governance layer. This is the strategic brain. It receives the task, decomposes it into sub-tasks, allocates resources, and makes decisions about which execution agents handle which parts. In a company analogy, this is the executive team deciding what to build and who builds it. The governance layer does not execute work directly. It plans and delegates.

Execution layer. Agents in this layer do the actual work — answering questions, generating content, running computations. Each execution agent has a focused scope defined by the governance layer. They do not communicate with each other directly. All coordination flows through governance. This eliminates the crosstalk that wastes tokens in flat architectures.

Compliance layer. After execution agents produce outputs, the compliance layer verifies quality. Does the output meet the task requirements? Are the facts correct? Is the response complete? Failed compliance checks route back to governance for re-assignment or correction. This is the quality gate that flat systems lack.

Layer	Responsibility	Authority	Communication
Governance	Task decomposition, resource allocation	High — sets strategy	Bidirectional with compliance, downward to execution
Execution	Task completion, focused work	Low — follows assignments	Upward to compliance only
Compliance	Quality verification, error detection	Medium — can reject and escalate	Upward to governance, receives from execution

How does this produce 102% performance improvement?

On SQuAD 2.0 using GPT-OSS-120B, OrgAgent achieved 102.73% performance improvement over flat multi-agent collaboration. The improvement comes from two sources.

Eliminated coordination waste. In flat systems, agents spend tokens negotiating who does what, reconciling contradictory outputs, and repeating work already done by another agent. OrgAgent eliminates this by making governance responsible for task allocation. Execution agents never talk to each other. They receive assignments and produce outputs. The 74.52% token reduction reflects how much of flat multi-agent token usage goes to coordination overhead rather than actual work.

Structured verification. Flat systems often lack a dedicated quality check. One agent produces an answer, another might disagree, and the system averages or votes. OrgAgent’s compliance layer applies focused verification, catching errors that execution agents miss. This is the difference between peer review (inconsistent, often skipped) and a dedicated QA process (systematic, always applied).

The 102.73% improvement means OrgAgent’s structured approach produces answers that are more than twice as good as the flat baseline on the same task with the same underlying model. The 74.52% token reduction means it does this while spending roughly one quarter the tokens.

When should you use OrgAgent versus a single agent?

The answer depends on whether your task actually needs multiple agents. Research (arXiv 2604.02460) shows that single-agent systems match or outperform multi-agent systems on sequential reasoning when token budgets are equalized.

OrgAgent earns its complexity in two scenarios.

Tasks with natural authority separation. If your pipeline has planning, execution, and verification as genuinely distinct operations — not just different prompts for the same model — the three-layer structure prevents the planning agent from self-grading its own work. The separation of governance from compliance is the same principle as separating the developer who writes the code from the reviewer who approves it.

Tasks where verification catches real errors. If execution agents produce outputs that need checking — factual claims, calculations, code that must compile — the compliance layer provides a systematic quality gate. Without it, errors pass through uncaught. With it, the governance layer can re-route failed tasks before they reach the user. For multi-agent architectures where verification is less important, simpler designs like hub-and-spoke orchestration may be sufficient.

Key takeaways

Structure beats headcount. OrgAgent’s three layers outperform flat multi-agent by 102% while using 75% fewer tokens.
Governance, execution, compliance. Three layers with distinct authority and communication patterns mirror successful human organizations.
Execution agents never talk to each other. All coordination flows through governance. This eliminates the crosstalk that wastes tokens in flat architectures.
Compliance is the missing layer. Most multi-agent systems lack dedicated verification. Adding it catches errors that self-grading misses.
Not always necessary. Single agents win on sequential reasoning. OrgAgent earns its complexity when tasks need authority separation and systematic verification.

OrgAgent: what happens when you organize multi-agent systems like a company

TL;DR

Why do flat multi-agent systems waste tokens?

What are the three layers and what does each do?

How does this produce 102% performance improvement?

When should you use OrgAgent versus a single agent?

Key takeaways

Further reading

Related across topics

Share on

TL;DR

Why do flat multi-agent systems waste tokens?

What are the three layers and what does each do?

How does this produce 102% performance improvement?

When should you use OrgAgent versus a single agent?

Key takeaways

Further reading

Related across topics

Model Serving Architecture

Share on