How do you scope agent capabilities?

Positive allowlists per agent role. Define which tools each agent CAN use, not which it can't. A research agent gets web search and document reading. A writing agent gets file creation. Neither gets shell access or network egress. Capability grants are per-task, not per-session. When the task changes, the capability set changes.

When should HITL gates be required?

At trust boundary crossings: when an agent's action affects external systems (sending emails, making API calls), when actions are irreversible (deleting data, financial transactions), when the action involves sensitive data (PII access, credential handling), and when the agent chain crosses organizational boundaries (one organization's agent delegating to another's).

What audit logging does a multi-agent system need?

Every tool call with: which agent, what tool, what parameters, what response, timestamp, and the request chain (which upstream agent triggered this call). Cross-agent request tracing with correlation IDs. Enough detail for post-incident reconstruction of the full execution path from initial user request to final action.

Securing agent orchestration: patterns and controls for production multi-agent systems

Q: What sandboxing do agents need?

Each agent should run in its own container with no host filesystem access, restricted network egress (allowlisted destinations only), read-only root filesystem, dropped Linux capabilities, and resource limits (CPU, memory, time). gVisor or Kata Containers provide stronger isolation than standard Docker. The sandbox must be enforced out-of-process: the agent cannot modify or disable its own restrictions.

8 minute read

“We secured each agent individually. We forgot to secure the space between them.”

TL;DR

Multi-agent orchestration frameworks provide coordination but not security. Production systems need five controls: process-level sandboxing, capability scoping per agent role, authenticated delegation with signed requests, HITL gates at trust boundaries, and cross-agent audit logging. Most deployments implement zero of these. For the threat model these controls defend against, see How multi-agent systems fail.

A data center aisle with security cameras above each rack showing active green indicators

What does the orchestration landscape look like?

Three frameworks dominate multi-agent orchestration, each with a different architectural model.

LangGraph (LangChain) uses graph-based orchestration: agents are nodes, communication channels are edges, execution follows conditional paths through the graph. Strong for complex workflows with branching logic. Security-relevant: the graph structure defines which agents can communicate with which, providing a natural enforcement point for communication policies.

CrewAI uses role-based orchestration: agents are defined with specific roles, goals, and backstories. Agents collaborate like a team with a manager. Security-relevant: the role model maps naturally to capability scoping. A “researcher” agent shouldn’t have the same tool access as a “deployer” agent.

AutoGen (Microsoft) uses conversational orchestration: agents communicate through natural language messages with dynamic role-playing. Flexible but harder to constrain because the communication protocol is unstructured text.

All three provide coordination primitives. None provides security primitives out of the box. Sandboxing, authentication, capability scoping, and audit logging are left to the deployer. Enterprise deployments can layer Azure Entra Agent ID with RBAC for agent-level access control and VNet integration for network isolation, but these are infrastructure additions, not framework features.

Control 1: How do you sandbox agents?

Each agent runs in its own isolated environment that it cannot escape or modify.

Container isolation. Run each agent in a separate container (Docker, Podman). No shared filesystem between agents. No host filesystem access. Read-only root filesystem. Dropped Linux capabilities (no CAP_NET_RAW, no CAP_SYS_ADMIN). Resource limits on CPU, memory, and execution time to prevent resource exhaustion attacks.

Stronger isolation for high-risk agents. gVisor (application kernel) or Kata Containers (micro-VM) provide stronger boundaries than standard Docker containers. gVisor intercepts system calls and implements them in a user-space kernel, limiting the attack surface. Kata Containers run each container in a lightweight virtual machine with its own kernel.

Network egress control. Restrict outbound network access to an allowlist of destinations. An agent that needs to query an API gets access to that API’s domain. It doesn’t get unrestricted internet access. Claude Code’s CVE-2025-55284 was exploitable because ping was on the allowlist. Network egress control prevents DNS-based exfiltration, C2 callbacks, and unauthorized API calls.

Out-of-process enforcement. The sandbox must be enforced by the infrastructure, not by the agent. An agent can’t modify its own container configuration. An agent can’t disable its network restrictions. The enforcement layer runs outside the agent’s process and cannot be influenced by prompt injection.

Control 2: How do you scope capabilities?

Positive allowlists per agent role. Never negative blocklists.

Define capability sets per role. Map each agent’s role to the minimum set of tools it needs:

Agent Role	Allowed Tools	Explicitly Denied
Research agent	Web search, document reader	Shell, network, file write
Writing agent	File create, file read	Shell, network, database
Analysis agent	Database read, calculator	Database write, shell, network
Deployment agent	CI/CD API, config read	Database, arbitrary shell

Per-task, not per-session. When the agent’s task changes, the capability set changes. An agent that needs shell access for one specific step gets it for that step and loses it afterward. JIT (Just-In-Time) capability grants with automatic revocation after task completion.

Enforce at the tool layer. The tool execution layer checks the agent’s current capability set before executing any tool call. If the agent requests a tool outside its allowlist, the call is rejected and logged. The agent can’t bypass this check because it runs outside the agent’s process.

Control 3: How do you authenticate delegation?

When Agent A delegates a task to Agent B, the delegation must be verifiable, scoped, and non-replayable.

Signed requests. Every inter-agent request carries a cryptographic signature from the sending agent. The receiving agent verifies the signature before processing. W3C HTTP Message Signatures provide a standard for this. Without signatures, any process that can send messages on the inter-agent communication channel can impersonate any agent.

Short-lived delegation tokens. When Agent A delegates to Agent B, the delegation includes a token that specifies: which agent is delegating, what task is being delegated, what tools Agent B can use for this task, and when the token expires. Tokens expire in minutes, not hours. Per-hop validation: Agent B validates the token before acting, and if Agent B delegates to Agent C, a new scoped token is issued.

Anti-replay. Include nonces or timestamps in signed requests so that intercepted delegation requests can’t be replayed. A delegation token that was valid five minutes ago shouldn’t be valid now.

For the full cryptographic identity infrastructure needed for agent-to-agent trust, see Cryptographic capability binding.

Control 4: When do you require HITL gates?

HITL (Human-In-The-Loop) gates add latency. Use them where the cost of a wrong action exceeds the cost of the delay.

Trust boundary crossings. Any action that reaches outside the multi-agent system: sending emails, posting to external APIs, triggering webhooks, creating external resources. The agent chain operates within its sandbox. Actions that leave the sandbox require human approval.

Irreversible actions. Deleting data, financial transactions, publishing content, modifying production configurations. Anything that can’t be undone with a simple rollback.

Sensitive data access. Querying PII, accessing financial records, reading authentication credentials. The HITL gate verifies that the data access is legitimate for the current task.

Cross-organizational boundaries. When one organization’s agent delegates to another organization’s agent through A2A or MCP, a human should approve the delegation. Cross-organizational trust is harder to verify and harder to revoke.

The implementation: the HITL gate presents the pending action with full context (which agent, what action, why, what data is involved). The human approves, denies, or modifies the action. Denials are logged with reasons for post-incident analysis.

Control 5: What audit logging do you need?

Every tool call across every agent with enough detail for post-incident reconstruction.

Per-tool-call logging. For each tool invocation: timestamp, agent ID, tool name, input parameters, output response, execution duration, and the upstream request chain (which agent triggered this call and why).

Cross-agent request tracing. Assign correlation IDs at the entry point of every user request. Propagate the correlation ID through every inter-agent delegation and tool call. Post-incident, you can reconstruct the entire execution path from user request to final action by filtering on the correlation ID.

Anomaly baseline. Establish baseline patterns for each agent: typical tool call frequency, typical data volumes, typical delegation patterns. Alert when an agent deviates: sudden spike in tool calls, unusual tool combinations, first-time use of a tool the agent hasn’t called before.

Immutable storage. Write audit logs to append-only storage that agents can’t modify or delete. If an agent is compromised, the compromise is recorded in logs the agent can’t tamper with.

Key takeaways

Orchestration frameworks (LangGraph, CrewAI, AutoGen) provide coordination but not security. All five controls must be added by the deployer.
Sandbox each agent in its own container with no host access, restricted network egress, and out-of-process enforcement
Scope capabilities with positive allowlists per agent role, granted per-task not per-session
Authenticate delegation with signed requests, short-lived scoped tokens, and anti-replay protection
Require HITL gates at trust boundary crossings, irreversible actions, sensitive data access, and cross-organizational delegation
Log every tool call with cross-agent correlation IDs in immutable storage for post-incident reconstruction

FAQ

What sandboxing do agents need?

Separate containers per agent with no host filesystem, restricted network egress, read-only root, dropped capabilities, and resource limits. gVisor or Kata Containers for high-risk agents. Enforcement is out-of-process: the agent cannot modify its own sandbox.

How do you scope capabilities?

Positive allowlists per role. Each agent gets only the tools its current task requires. Grants are per-task with automatic revocation. Enforcement at the tool layer, outside the agent’s process. Never use negative blocklists.

When should HITL gates fire?

Trust boundary crossings (external actions), irreversible operations (deletes, transactions), sensitive data access (PII, credentials), and cross-organizational delegation. The gate shows full context and logs all decisions including denials.

What audit logging is needed?

Every tool call with agent ID, parameters, response, timestamp, and upstream chain. Cross-agent correlation IDs for request tracing. Anomaly detection against baselines. Immutable append-only storage.

Want to work together?

I take on projects, advisory roles, and fractional CTO engagements in AI/ML. I also help businesses go AI-native with agentic workflows and agent orchestration.

Get in touch

Securing agent orchestration: patterns and controls for production multi-agent systems

TL;DR

What does the orchestration landscape look like?

Control 1: How do you sandbox agents?

Control 2: How do you scope capabilities?

Control 3: How do you authenticate delegation?

Control 4: When do you require HITL gates?

Control 5: What audit logging do you need?

Key takeaways

FAQ

What sandboxing do agents need?

How do you scope capabilities?

When should HITL gates fire?

What audit logging is needed?

Related across topics

Share on

TL;DR

What does the orchestration landscape look like?

Control 1: How do you sandbox agents?

Control 2: How do you scope capabilities?

Control 3: How do you authenticate delegation?

Control 4: When do you require HITL gates?

Control 5: What audit logging do you need?

Key takeaways

FAQ

What sandboxing do agents need?

How do you scope capabilities?

When should HITL gates fire?

What audit logging is needed?

Related across topics

Prompt Injection Defense

Share on