What is a skills manifest hash?

A SHA-256 commitment to an agent's complete tool configuration, embedded as an X.509 v3 certificate extension. It covers every tool's identifier, version, source code or API schema hash, and permission set. If any tool is added, removed, or modified, the hash changes and the certificate becomes invalid. This makes silent capability escalation cryptographically detectable.

How does this compare to SPIFFE or OAuth 2.1?

SPIFFE binds workload identity but not capability set. A SPIFFE credential remains valid even after an agent silently acquires new tools. OAuth 2.1 authenticates and scopes access at token issuance time but cannot detect post-authorization capability drift. The skills manifest hash fills the gap between identity (who) and capability (what), detecting changes that both SPIFFE and OAuth miss.

Is this production-ready?

The cryptographic primitives (X.509, SHA-256, Ed25519) are battle-tested. The specific agent extensions are research-stage with strong performance numbers: 97-microsecond verification, 0.62ms governance overhead per tool call, F1=0.990 detection accuracy for single-provider deployments. The gap is tooling and adoption, not cryptographic soundness.

What about the EU AI Act?

The EU AI Act reaches full applicability on August 2, 2026 and requires traceability for high-risk AI systems. Agents that dynamically acquire capabilities at runtime have no audit trail for what tools they used and when. Cryptographic capability binding produces a verifiable interaction ledger with hash-linked, signed records that satisfy the traceability requirement.

Cryptographic capability binding: the missing identity layer for AI agents

11 minute read

“Stop arguing about prompt injection defenses. The real problem is that agents don’t have identities.”

TL;DR

Every AI agent calling tools via MCP or A2A today operates without cryptographic identity. 46.4% of MCP servers can’t distinguish one caller from another (arXiv 2603.07473, Huang et al.). The fix already exists in a 40-year-old technology: X.509 certificates, extended with a skills manifest hash that binds an agent’s identity to its declared capability set. Verification takes 97 microseconds. Any tool change invalidates the certificate. Twelve attack scenarios detected, zero false positives, while baseline MCP plus OAuth 2.1 detected none (arXiv 2603.14332, Zhou). For the broader MCP attack surface, see The MCP SSRF epidemic.

A hardware security module with amber verification LED, cold blue data center lighting

Why don’t AI agents have cryptographic identities?

Because agents emerged from chatbot architectures where identity didn’t matter.

A chatbot generates text. You send it a prompt, it sends back tokens. No tools, no side effects, no delegation. Authentication was an afterthought because the worst case was a bad response. Then agents happened. Agents call tools, access production data, delegate to other agents, and compose pipelines that span multiple providers and services. The worst case became “exfiltrate your AWS credentials and rewrite your system prompts.”

The protocols connecting these agents were not designed for this. MCP launched in November 2024 with no authentication framework. OAuth 2.1 was standardized into the MCP spec in March 2025, a full four months later. Google’s A2A protocol supports multiple auth mechanisms through its Agent Card, but the Agent Card itself is an unsigned JSON document at a well-known URL. Anyone who controls the domain controls the identity.

Authentication answers “who is calling?” It does not answer “what can this caller do?” or “has this caller changed since it was authorized?” That gap has a name.

What is the capability-identity gap?

The paper “Agentic AI as a Cybersecurity Attack Surface” (arXiv 2602.19555, Jiang et al.) coined the term. It describes the structural disconnect between an agent’s authorized identity and its actual runtime capabilities.

Traditional software resolves dependencies at build time. You declare them in a lockfile, pin versions, verify checksums. The binary you deploy is the binary you tested. Agents don’t work this way. They resolve tool access at runtime through what the paper calls stochastic dependency resolution: the LLM reads tool descriptions, picks the one that semantically matches the intent, and calls it. No lockfile. No pinned version. No checksum.

This creates two attack vectors:

Semantic masquerading. An attacker publishes a tool with a description carefully crafted to overlap with a legitimate tool’s description. The agent picks the malicious tool because it looks right. Minor perturbations to tool metadata can significantly shift selection accuracy.

Silent capability escalation. An agent authorized with three tools acquires a fourth tool at runtime via MCP server discovery. No authorization event fires. No audit log records the change. The agent’s effective permissions expanded, but its identity credentials remain valid.

A companion paper (arXiv 2603.07473, Huang et al.) measured this gap empirically. They analyzed 6,137 MCP servers and found 46.4% exhibit insecure authorization behavior. The three patterns they identified: AuthNone (no authorization at all), AuthCache (one-time authorization cached and reused for all subsequent calls regardless of caller), and AuthRuntime (shared in-memory state not scoped to individual callers). Developer tools were the worst category at 53.3% vulnerability rate.

The bottom line: if you authorize an agent today, you have no mechanism to detect that it’s the same agent with the same capabilities tomorrow.

How did PKI solve this problem for the web?

The agent identity problem is structurally identical to a problem the web solved thirty years ago: how does a browser know it’s talking to the real google.com?

In 1995, Netscape needed a way for browsers to verify server identity. The solution was X.509 certificates, originally designed for X.500 directory services in 1988. A Certificate Authority (CA) issues a certificate that binds a public key to a domain name. The browser validates the certificate chain back to a pre-installed root CA. If the chain is valid, the server is who it claims to be.

X.509 version 3, standardized in 1996, added the extension mechanism that makes the technology adaptable. Each extension is a structured field with an Object Identifier (OID), a criticality flag, and arbitrary data. Organizations define custom extensions using private-enterprise OIDs. This is how TLS certificates carry Subject Alternative Names, Certificate Transparency timestamps, and OCSP stapling information. The same extension mechanism can carry agent capability metadata.

The key insight is that PKI doesn’t just authenticate identity. It binds identity to a specific set of properties at a specific point in time. For web servers, those properties are domain names. For agents, they should be capabilities.

How does cryptographic capability binding work?

The paper “Governing Dynamic Capabilities” (arXiv 2603.14332, Zhou, March 2026) proposes three mechanisms.

Skills manifest hash

The core mechanism. Every agent receives an X.509 v3 certificate with a custom extension containing a SHA-256 hash of its complete tool configuration:

H = SHA-256(canonical({(tool_id, version, source_hash, permissions)} for each tool))

For open-source tools, source_hash is the code hash. For closed-source tools, it’s the API schema hash. The hash covers the tool identifier, version, source code or schema hash, and permission set for every tool the agent is authorized to use.

If any tool is added, removed, or its implementation changes, the hash changes. The certificate becomes invalid at verification time. Silent capability escalation becomes cryptographically detectable.

flowchart LR
    subgraph "Authorization Time"
        A[Agent registers tools] --> B[CA computes skills<br/>manifest hash]
        B --> C[CA issues X.509 cert<br/>with hash extension]
    end

    subgraph "Runtime"
        D[Agent calls tool] --> E{Verify cert chain}
        E -->|Valid chain| F{Recompute skills<br/>manifest hash}
        F -->|Hash matches cert| G[Allow tool call]
        F -->|Hash mismatch| H[Block + alert:<br/>capability drift detected]
        E -->|Invalid chain| H
    end

    C --> D

Reproducibility verification

LLM inference is near-deterministic with fixed seeds and parameters. The paper exploits this for retrospective verification: replay an agent’s input and check whether the output matches. Three verification levels:

Full reproducibility: Bitwise identical outputs. Same provider, same model, same seed.
Statistical reproducibility: Character-level match above a threshold. The threshold depends on the task: 0.98 for code generation, 0.10 for creative prose.
No reproducibility: Triggers a one-tier trust downgrade.

The empirical validation spans 15,120 pairwise comparisons across 9 models from 7 providers. Single-provider deployments achieve F1 = 0.990 with 11.5x separation between legitimate and adversarial replays. Cross-provider achieves F1 = 0.876.

Verifiable interaction ledger

Every agent interaction gets a hash-linked, cryptographically signed record containing: agent identities, timestamps, certificate hashes, input/output SHA-256 commitments, reproducibility anchors, and bilateral Ed25519 signatures. The ledger stores commitments only, not raw content, preserving privacy while enabling forensic reconstruction. Append throughput is approximately 2,230 records per second.

How does this compare to other approaches?

The performance differences are not marginal. They span six orders of magnitude.

Approach	What it binds	Verification latency	Detects capability drift?
X.509 + skills manifest (2603.14332)	Identity + capability set	66 microseconds	Yes (all 12 attack scenarios)
OAuth 2.1 (MCP spec)	Identity + scoped access	Token validation ~1ms	No
SPIFFE/SPIRE (CNCF)	Workload identity	SVID validation ~1ms	No
BAID zkVM (2512.17538)	Identity + code binary	14-93ms verify; 15-38s proof	Yes (different scope)
DIDs + VCs (2511.02841)	Decentralized identity + credentials	20-40 seconds per flow	Partial
PFI (2503.15547)	Trusted/untrusted data flow	Runtime (not quantified)	No (runtime defense, not identity)
SEAgent MAC (2601.11893)	Tool-level access control	Near-zero (policy lookup)	No (static attributes)

The BAID approach uses zero-knowledge proofs to verify that an agent’s executing code matches its registered binary. Cryptographically rigorous, but proof generation takes 15-38 seconds and verification takes 14-93 milliseconds depending on conversation depth. The X.509 approach is approximately 1.2 million times faster because it verifies a hash commitment rather than a zero-knowledge proof.

PFI and SEAgent are complementary, not competing. They operate at the runtime execution layer: what does the agent do with data and tools right now? The X.509 approach operates at the identity and governance layer: is this agent who it claims to be, and does it still have the capabilities it was authorized for? A complete security stack needs both layers. For background on the runtime defense side, see Prompt injection defense.

What would agent PKI look like in practice?

Deploying certificates for agents differs from deploying them for web servers in three ways.

Lifetime. TLS certificates last 90-398 days. Agent certificates may need lifetimes measured in minutes or single task durations. Agents spawn, execute a pipeline, and terminate. The certificate should expire with the task.

Scale. A large enterprise might have thousands of TLS certificates. An agentic system could spawn millions of agent instances per day. Certificate issuance and verification must be automated, fast, and cheap. The 66-microsecond verification latency and 2.69-microsecond skills manifest hash computation suggest this is feasible.

Capability dynamism. Web server capabilities don’t change between certificate issuance and renewal. Agent capabilities can change mid-conversation via MCP server discovery. The skills manifest hash handles this: any change invalidates the current certificate, forcing re-authorization.

A practical architecture layers existing infrastructure:

flowchart TD
    subgraph "Control Plane"
        CA[Agent Certificate Authority<br/>ACME-based issuance]
        REG[Agent Name Service<br/>IETF draft: discovery + resolution]
        POL[Policy Engine<br/>capability allowlists per role]
    end

    subgraph "Agent Runtime"
        AG[Agent Instance]
        CERT[X.509 cert with<br/>skills manifest hash]
        LED[Verifiable Interaction Ledger<br/>hash-linked signed records]
    end

    subgraph "Tool Layer"
        MCP[MCP Servers]
        A2A[A2A Endpoints]
    end

    CA -->|Issues cert| CERT
    POL -->|Defines allowed tools| CA
    AG -->|Presents cert| MCP
    AG -->|Presents cert| A2A
    MCP -->|Verifies hash| CERT
    AG -->|Appends record| LED
    REG -->|Resolves agent identity| AG

SPIFFE/SPIRE handles the workload identity layer. It’s a CNCF graduated project that already issues short-lived X.509 certificates (SVIDs) to workloads in Kubernetes. Extending SVIDs with a skills manifest hash field gives you agent identity plus capability binding without building a CA from scratch.

ACME (the protocol behind Let’s Encrypt) handles automated issuance. The Agent Name Service (ANS), proposed as an IETF draft, handles discovery: think DNS for agents, mapping agent identities to verified capabilities, cryptographic keys, and endpoints.

The EU AI Act reaches full applicability on August 2, 2026. It requires traceability for high-risk AI systems. Agents that dynamically acquire capabilities at runtime have no audit trail today. The verifiable interaction ledger produces the cryptographic evidence the regulation demands.

Takeaways

46.4% of MCP servers can’t distinguish one caller from another. Authentication (who is calling) exists. Capability binding (what can this caller do, and has it changed) does not.
The capability-identity gap is structural: agents resolve tool access through probabilistic semantic matching at runtime, not deterministic manifests at build time.
X.509 v3 extensions with skills manifest hashing bind identity to capability set. Any tool change invalidates the certificate. Verification in 66 microseconds.
This is 1.2 million times faster than the zero-knowledge proof alternative (BAID), with detection accuracy of F1=0.990 for single-provider deployments.
Existing infrastructure (SPIFFE/SPIRE, ACME, ANS) can support agent PKI without building from scratch. The novel piece is the skills manifest hash extension.
Runtime defenses (PFI, SEAgent, guardrails) and identity-layer defenses (capability binding) are complementary. Neither alone is sufficient.

Cryptographic capability binding: the missing identity layer for AI agents

TL;DR

Why don’t AI agents have cryptographic identities?

What is the capability-identity gap?

How did PKI solve this problem for the web?

How does cryptographic capability binding work?

Skills manifest hash

Reproducibility verification

Verifiable interaction ledger

How does this compare to other approaches?

What would agent PKI look like in practice?

Takeaways

Further reading

Related across topics

Share on

TL;DR

Why don’t AI agents have cryptographic identities?

What is the capability-identity gap?

How did PKI solve this problem for the web?

How does cryptographic capability binding work?

Skills manifest hash

Reproducibility verification

Verifiable interaction ledger

How does this compare to other approaches?

What would agent PKI look like in practice?

Takeaways

Further reading

Related across topics

Prompt Injection Defense

Share on