Securing agent orchestration: patterns and controls for production multi-agent systems
“We secured each agent individually. We forgot to secure the space between them.”
TL;DR
Multi-agent orchestration frameworks provide coordination but not security. Production systems need five controls: process-level sandboxing, capability scoping per agent role, authenticated delegation with signed requests, HITL gates at trust boundaries, and cross-agent audit logging. Most deployments implement zero of these. For the threat model these controls defend against, see How multi-agent systems fail.

What does the orchestration landscape look like?
Three frameworks dominate multi-agent orchestration, each with a different architectural model.
LangGraph (LangChain) uses graph-based orchestration: agents are nodes, communication channels are edges, execution follows conditional paths through the graph. Strong for complex workflows with branching logic. Security-relevant: the graph structure defines which agents can communicate with which, providing a natural enforcement point for communication policies.
CrewAI uses role-based orchestration: agents are defined with specific roles, goals, and backstories. Agents collaborate like a team with a manager. Security-relevant: the role model maps naturally to capability scoping. A “researcher” agent shouldn’t have the same tool access as a “deployer” agent.
AutoGen (Microsoft) uses conversational orchestration: agents communicate through natural language messages with dynamic role-playing. Flexible but harder to constrain because the communication protocol is unstructured text.
All three provide coordination primitives. None provides security primitives out of the box. Sandboxing, authentication, capability scoping, and audit logging are left to the deployer. Enterprise deployments can layer Azure Entra Agent ID with RBAC for agent-level access control and VNet integration for network isolation, but these are infrastructure additions, not framework features.
Control 1: How do you sandbox agents?
Each agent runs in its own isolated environment that it cannot escape or modify.
Container isolation. Run each agent in a separate container (Docker, Podman). No shared filesystem between agents. No host filesystem access. Read-only root filesystem. Dropped Linux capabilities (no CAP_NET_RAW, no CAP_SYS_ADMIN). Resource limits on CPU, memory, and execution time to prevent resource exhaustion attacks.
Stronger isolation for high-risk agents. gVisor (application kernel) or Kata Containers (micro-VM) provide stronger boundaries than standard Docker containers. gVisor intercepts system calls and implements them in a user-space kernel, limiting the attack surface. Kata Containers run each container in a lightweight virtual machine with its own kernel.
Network egress control. Restrict outbound network access to an allowlist of destinations. An agent that needs to query an API gets access to that API’s domain. It doesn’t get unrestricted internet access. Claude Code’s CVE-2025-55284 was exploitable because ping was on the allowlist. Network egress control prevents DNS-based exfiltration, C2 callbacks, and unauthorized API calls.
Out-of-process enforcement. The sandbox must be enforced by the infrastructure, not by the agent. An agent can’t modify its own container configuration. An agent can’t disable its network restrictions. The enforcement layer runs outside the agent’s process and cannot be influenced by prompt injection.
Control 2: How do you scope capabilities?
Positive allowlists per agent role. Never negative blocklists.
Define capability sets per role. Map each agent’s role to the minimum set of tools it needs:
| Agent Role | Allowed Tools | Explicitly Denied |
|---|---|---|
| Research agent | Web search, document reader | Shell, network, file write |
| Writing agent | File create, file read | Shell, network, database |
| Analysis agent | Database read, calculator | Database write, shell, network |
| Deployment agent | CI/CD API, config read | Database, arbitrary shell |
Per-task, not per-session. When the agent’s task changes, the capability set changes. An agent that needs shell access for one specific step gets it for that step and loses it afterward. JIT (Just-In-Time) capability grants with automatic revocation after task completion.
Enforce at the tool layer. The tool execution layer checks the agent’s current capability set before executing any tool call. If the agent requests a tool outside its allowlist, the call is rejected and logged. The agent can’t bypass this check because it runs outside the agent’s process.
Control 3: How do you authenticate delegation?
When Agent A delegates a task to Agent B, the delegation must be verifiable, scoped, and non-replayable.
Signed requests. Every inter-agent request carries a cryptographic signature from the sending agent. The receiving agent verifies the signature before processing. W3C HTTP Message Signatures provide a standard for this. Without signatures, any process that can send messages on the inter-agent communication channel can impersonate any agent.
Short-lived delegation tokens. When Agent A delegates to Agent B, the delegation includes a token that specifies: which agent is delegating, what task is being delegated, what tools Agent B can use for this task, and when the token expires. Tokens expire in minutes, not hours. Per-hop validation: Agent B validates the token before acting, and if Agent B delegates to Agent C, a new scoped token is issued.
Anti-replay. Include nonces or timestamps in signed requests so that intercepted delegation requests can’t be replayed. A delegation token that was valid five minutes ago shouldn’t be valid now.
For the full cryptographic identity infrastructure needed for agent-to-agent trust, see Cryptographic capability binding.
Control 4: When do you require HITL gates?
HITL (Human-In-The-Loop) gates add latency. Use them where the cost of a wrong action exceeds the cost of the delay.
Trust boundary crossings. Any action that reaches outside the multi-agent system: sending emails, posting to external APIs, triggering webhooks, creating external resources. The agent chain operates within its sandbox. Actions that leave the sandbox require human approval.
Irreversible actions. Deleting data, financial transactions, publishing content, modifying production configurations. Anything that can’t be undone with a simple rollback.
Sensitive data access. Querying PII, accessing financial records, reading authentication credentials. The HITL gate verifies that the data access is legitimate for the current task.
Cross-organizational boundaries. When one organization’s agent delegates to another organization’s agent through A2A or MCP, a human should approve the delegation. Cross-organizational trust is harder to verify and harder to revoke.
The implementation: the HITL gate presents the pending action with full context (which agent, what action, why, what data is involved). The human approves, denies, or modifies the action. Denials are logged with reasons for post-incident analysis.
Control 5: What audit logging do you need?
Every tool call across every agent with enough detail for post-incident reconstruction.
Per-tool-call logging. For each tool invocation: timestamp, agent ID, tool name, input parameters, output response, execution duration, and the upstream request chain (which agent triggered this call and why).
Cross-agent request tracing. Assign correlation IDs at the entry point of every user request. Propagate the correlation ID through every inter-agent delegation and tool call. Post-incident, you can reconstruct the entire execution path from user request to final action by filtering on the correlation ID.
Anomaly baseline. Establish baseline patterns for each agent: typical tool call frequency, typical data volumes, typical delegation patterns. Alert when an agent deviates: sudden spike in tool calls, unusual tool combinations, first-time use of a tool the agent hasn’t called before.
Immutable storage. Write audit logs to append-only storage that agents can’t modify or delete. If an agent is compromised, the compromise is recorded in logs the agent can’t tamper with.
Key takeaways
- Orchestration frameworks (LangGraph, CrewAI, AutoGen) provide coordination but not security. All five controls must be added by the deployer.
- Sandbox each agent in its own container with no host access, restricted network egress, and out-of-process enforcement
- Scope capabilities with positive allowlists per agent role, granted per-task not per-session
- Authenticate delegation with signed requests, short-lived scoped tokens, and anti-replay protection
- Require HITL gates at trust boundary crossings, irreversible actions, sensitive data access, and cross-organizational delegation
- Log every tool call with cross-agent correlation IDs in immutable storage for post-incident reconstruction
FAQ
What sandboxing do agents need?
Separate containers per agent with no host filesystem, restricted network egress, read-only root, dropped capabilities, and resource limits. gVisor or Kata Containers for high-risk agents. Enforcement is out-of-process: the agent cannot modify its own sandbox.
How do you scope capabilities?
Positive allowlists per role. Each agent gets only the tools its current task requires. Grants are per-task with automatic revocation. Enforcement at the tool layer, outside the agent’s process. Never use negative blocklists.
When should HITL gates fire?
Trust boundary crossings (external actions), irreversible operations (deletes, transactions), sensitive data access (PII, credentials), and cross-organizational delegation. The gate shows full context and logs all decisions including denials.
What audit logging is needed?
Every tool call with agent ID, parameters, response, timestamp, and upstream chain. Cross-agent correlation IDs for request tracing. Anomaly detection against baselines. Immutable append-only storage.
Want to work together?
I take on projects, advisory roles, and fractional CTO engagements in AI/ML. I also help businesses go AI-native with agentic workflows and agent orchestration.
Get in touch