Cryptographic capability binding: the missing identity layer for AI agents
“Stop arguing about prompt injection defenses. The real problem is that agents don’t have identities.”
TL;DR
Every AI agent calling tools via MCP or A2A today operates without cryptographic identity. 46.4% of MCP servers can’t distinguish one caller from another (arXiv 2603.07473, Huang et al.). The fix already exists in a 40-year-old technology: X.509 certificates, extended with a skills manifest hash that binds an agent’s identity to its declared capability set. Verification takes 97 microseconds. Any tool change invalidates the certificate. Twelve attack scenarios detected, zero false positives, while baseline MCP plus OAuth 2.1 detected none (arXiv 2603.14332, Zhou). For the broader MCP attack surface, see The MCP SSRF epidemic.

Why don’t AI agents have cryptographic identities?
Because agents emerged from chatbot architectures where identity didn’t matter.
A chatbot generates text. You send it a prompt, it sends back tokens. No tools, no side effects, no delegation. Authentication was an afterthought because the worst case was a bad response. Then agents happened. Agents call tools, access production data, delegate to other agents, and compose pipelines that span multiple providers and services. The worst case became “exfiltrate your AWS credentials and rewrite your system prompts.”
The protocols connecting these agents were not designed for this. MCP launched in November 2024 with no authentication framework. OAuth 2.1 was standardized into the MCP spec in March 2025, a full four months later. Google’s A2A protocol supports multiple auth mechanisms through its Agent Card, but the Agent Card itself is an unsigned JSON document at a well-known URL. Anyone who controls the domain controls the identity.
Authentication answers “who is calling?” It does not answer “what can this caller do?” or “has this caller changed since it was authorized?” That gap has a name.
What is the capability-identity gap?
The paper “Agentic AI as a Cybersecurity Attack Surface” (arXiv 2602.19555, Jiang et al.) coined the term. It describes the structural disconnect between an agent’s authorized identity and its actual runtime capabilities.
Traditional software resolves dependencies at build time. You declare them in a lockfile, pin versions, verify checksums. The binary you deploy is the binary you tested. Agents don’t work this way. They resolve tool access at runtime through what the paper calls stochastic dependency resolution: the LLM reads tool descriptions, picks the one that semantically matches the intent, and calls it. No lockfile. No pinned version. No checksum.
This creates two attack vectors:
Semantic masquerading. An attacker publishes a tool with a description carefully crafted to overlap with a legitimate tool’s description. The agent picks the malicious tool because it looks right. Minor perturbations to tool metadata can significantly shift selection accuracy.
Silent capability escalation. An agent authorized with three tools acquires a fourth tool at runtime via MCP server discovery. No authorization event fires. No audit log records the change. The agent’s effective permissions expanded, but its identity credentials remain valid.
A companion paper (arXiv 2603.07473, Huang et al.) measured this gap empirically. They analyzed 6,137 MCP servers and found 46.4% exhibit insecure authorization behavior. The three patterns they identified: AuthNone (no authorization at all), AuthCache (one-time authorization cached and reused for all subsequent calls regardless of caller), and AuthRuntime (shared in-memory state not scoped to individual callers). Developer tools were the worst category at 53.3% vulnerability rate.
The bottom line: if you authorize an agent today, you have no mechanism to detect that it’s the same agent with the same capabilities tomorrow.
How did PKI solve this problem for the web?
The agent identity problem is structurally identical to a problem the web solved thirty years ago: how does a browser know it’s talking to the real google.com?
In 1995, Netscape needed a way for browsers to verify server identity. The solution was X.509 certificates, originally designed for X.500 directory services in 1988. A Certificate Authority (CA) issues a certificate that binds a public key to a domain name. The browser validates the certificate chain back to a pre-installed root CA. If the chain is valid, the server is who it claims to be.
X.509 version 3, standardized in 1996, added the extension mechanism that makes the technology adaptable. Each extension is a structured field with an Object Identifier (OID), a criticality flag, and arbitrary data. Organizations define custom extensions using private-enterprise OIDs. This is how TLS certificates carry Subject Alternative Names, Certificate Transparency timestamps, and OCSP stapling information. The same extension mechanism can carry agent capability metadata.
The key insight is that PKI doesn’t just authenticate identity. It binds identity to a specific set of properties at a specific point in time. For web servers, those properties are domain names. For agents, they should be capabilities.
How does cryptographic capability binding work?
The paper “Governing Dynamic Capabilities” (arXiv 2603.14332, Zhou, March 2026) proposes three mechanisms.
Skills manifest hash
The core mechanism. Every agent receives an X.509 v3 certificate with a custom extension containing a SHA-256 hash of its complete tool configuration:
H = SHA-256(canonical({(tool_id, version, source_hash, permissions)} for each tool))
For open-source tools, source_hash is the code hash. For closed-source tools, it’s the API schema hash. The hash covers the tool identifier, version, source code or schema hash, and permission set for every tool the agent is authorized to use.
If any tool is added, removed, or its implementation changes, the hash changes. The certificate becomes invalid at verification time. Silent capability escalation becomes cryptographically detectable.
flowchart LR
subgraph "Authorization Time"
A[Agent registers tools] --> B[CA computes skills<br/>manifest hash]
B --> C[CA issues X.509 cert<br/>with hash extension]
end
subgraph "Runtime"
D[Agent calls tool] --> E{Verify cert chain}
E -->|Valid chain| F{Recompute skills<br/>manifest hash}
F -->|Hash matches cert| G[Allow tool call]
F -->|Hash mismatch| H[Block + alert:<br/>capability drift detected]
E -->|Invalid chain| H
end
C --> D
Reproducibility verification
LLM inference is near-deterministic with fixed seeds and parameters. The paper exploits this for retrospective verification: replay an agent’s input and check whether the output matches. Three verification levels:
- Full reproducibility: Bitwise identical outputs. Same provider, same model, same seed.
- Statistical reproducibility: Character-level match above a threshold. The threshold depends on the task: 0.98 for code generation, 0.10 for creative prose.
- No reproducibility: Triggers a one-tier trust downgrade.
The empirical validation spans 15,120 pairwise comparisons across 9 models from 7 providers. Single-provider deployments achieve F1 = 0.990 with 11.5x separation between legitimate and adversarial replays. Cross-provider achieves F1 = 0.876.
Verifiable interaction ledger
Every agent interaction gets a hash-linked, cryptographically signed record containing: agent identities, timestamps, certificate hashes, input/output SHA-256 commitments, reproducibility anchors, and bilateral Ed25519 signatures. The ledger stores commitments only, not raw content, preserving privacy while enabling forensic reconstruction. Append throughput is approximately 2,230 records per second.
How does this compare to other approaches?
The performance differences are not marginal. They span six orders of magnitude.
| Approach | What it binds | Verification latency | Detects capability drift? |
|---|---|---|---|
| X.509 + skills manifest (2603.14332) | Identity + capability set | 66 microseconds | Yes (all 12 attack scenarios) |
| OAuth 2.1 (MCP spec) | Identity + scoped access | Token validation ~1ms | No |
| SPIFFE/SPIRE (CNCF) | Workload identity | SVID validation ~1ms | No |
| BAID zkVM (2512.17538) | Identity + code binary | 14-93ms verify; 15-38s proof | Yes (different scope) |
| DIDs + VCs (2511.02841) | Decentralized identity + credentials | 20-40 seconds per flow | Partial |
| PFI (2503.15547) | Trusted/untrusted data flow | Runtime (not quantified) | No (runtime defense, not identity) |
| SEAgent MAC (2601.11893) | Tool-level access control | Near-zero (policy lookup) | No (static attributes) |
The BAID approach uses zero-knowledge proofs to verify that an agent’s executing code matches its registered binary. Cryptographically rigorous, but proof generation takes 15-38 seconds and verification takes 14-93 milliseconds depending on conversation depth. The X.509 approach is approximately 1.2 million times faster because it verifies a hash commitment rather than a zero-knowledge proof.
PFI and SEAgent are complementary, not competing. They operate at the runtime execution layer: what does the agent do with data and tools right now? The X.509 approach operates at the identity and governance layer: is this agent who it claims to be, and does it still have the capabilities it was authorized for? A complete security stack needs both layers. For background on the runtime defense side, see Prompt injection defense.
What would agent PKI look like in practice?
Deploying certificates for agents differs from deploying them for web servers in three ways.
Lifetime. TLS certificates last 90-398 days. Agent certificates may need lifetimes measured in minutes or single task durations. Agents spawn, execute a pipeline, and terminate. The certificate should expire with the task.
Scale. A large enterprise might have thousands of TLS certificates. An agentic system could spawn millions of agent instances per day. Certificate issuance and verification must be automated, fast, and cheap. The 66-microsecond verification latency and 2.69-microsecond skills manifest hash computation suggest this is feasible.
Capability dynamism. Web server capabilities don’t change between certificate issuance and renewal. Agent capabilities can change mid-conversation via MCP server discovery. The skills manifest hash handles this: any change invalidates the current certificate, forcing re-authorization.
A practical architecture layers existing infrastructure:
flowchart TD
subgraph "Control Plane"
CA[Agent Certificate Authority<br/>ACME-based issuance]
REG[Agent Name Service<br/>IETF draft: discovery + resolution]
POL[Policy Engine<br/>capability allowlists per role]
end
subgraph "Agent Runtime"
AG[Agent Instance]
CERT[X.509 cert with<br/>skills manifest hash]
LED[Verifiable Interaction Ledger<br/>hash-linked signed records]
end
subgraph "Tool Layer"
MCP[MCP Servers]
A2A[A2A Endpoints]
end
CA -->|Issues cert| CERT
POL -->|Defines allowed tools| CA
AG -->|Presents cert| MCP
AG -->|Presents cert| A2A
MCP -->|Verifies hash| CERT
AG -->|Appends record| LED
REG -->|Resolves agent identity| AG
SPIFFE/SPIRE handles the workload identity layer. It’s a CNCF graduated project that already issues short-lived X.509 certificates (SVIDs) to workloads in Kubernetes. Extending SVIDs with a skills manifest hash field gives you agent identity plus capability binding without building a CA from scratch.
ACME (the protocol behind Let’s Encrypt) handles automated issuance. The Agent Name Service (ANS), proposed as an IETF draft, handles discovery: think DNS for agents, mapping agent identities to verified capabilities, cryptographic keys, and endpoints.
The EU AI Act reaches full applicability on August 2, 2026. It requires traceability for high-risk AI systems. Agents that dynamically acquire capabilities at runtime have no audit trail today. The verifiable interaction ledger produces the cryptographic evidence the regulation demands.
Takeaways
- 46.4% of MCP servers can’t distinguish one caller from another. Authentication (who is calling) exists. Capability binding (what can this caller do, and has it changed) does not.
- The capability-identity gap is structural: agents resolve tool access through probabilistic semantic matching at runtime, not deterministic manifests at build time.
- X.509 v3 extensions with skills manifest hashing bind identity to capability set. Any tool change invalidates the certificate. Verification in 66 microseconds.
- This is 1.2 million times faster than the zero-knowledge proof alternative (BAID), with detection accuracy of F1=0.990 for single-provider deployments.
- Existing infrastructure (SPIFFE/SPIRE, ACME, ANS) can support agent PKI without building from scratch. The novel piece is the skills manifest hash extension.
- Runtime defenses (PFI, SEAgent, guardrails) and identity-layer defenses (capability binding) are complementary. Neither alone is sufficient.
Further reading
- arXiv 2603.14332: Governing Dynamic Capabilities — the X.509 capability binding paper
- arXiv 2603.07473: Caller Identity Confusion in MCP — the 46.4% vulnerability measurement
- arXiv 2602.19555: Agentic AI as Cybersecurity Attack Surface — capability-identity gap and stochastic dependency resolution
- SPIFFE: Secure Production Identity Framework — the workload identity layer
Want to work together?
I take on projects, advisory roles, and fractional CTO engagements in AI/ML. I also help businesses go AI-native with agentic workflows and agent orchestration.
Get in touch