What frameworks exist for AI threat modeling?

Five major frameworks cover the AI threat landscape. OWASP LLM Top 10 (2025) lists the top application-level vulnerabilities for LLM systems. OWASP Agentic AI Top 10 (2026) extends this to autonomous agent risks like inter-agent trust and cascading failures. MITRE ATLAS catalogs 66 adversarial techniques with real-world case studies. NIST AI RMF provides the governance scaffold across four functions. CSA MAESTRO maps a 7-layer architecture for multi-agent threat modeling.

Why can't I just use my existing web app threat model for AI?

Traditional threat models assume deterministic systems with well-defined input/output boundaries. LLMs break every one of those assumptions. They're probabilistic (same input, different output), instruction-following (the feature IS the attack vector), and memorize training data (no patch removes memorized secrets without retraining). The attack surface is every possible natural-language string, which is infinite.

Which AI security framework should I start with?

Start with OWASP LLM Top 10 for immediate application security gaps, then layer MITRE ATLAS for red team planning. Add NIST AI RMF when you need to build a governance program that leadership and auditors understand. If you're deploying agents, add the OWASP Agentic AI Top 10 and CSA MAESTRO. No single framework covers everything.

How many organizations have formal AI threat models?

Very few. Only 15% of organizations have formal, comprehensive AI governance policies (ISACA, 3,270 professionals, May 2024). 63% of organizations that experienced AI breaches had no AI governance policy or were still developing one (IBM/Ponemon, 2025). 45% of cybersecurity professionals report no involvement in AI implementation at their organization.

The AI threat model every security team is missing

16 minute read

“Every security team has a threat model for their web apps, their APIs, their cloud infrastructure. Ask about their AI systems and they point to the same document.”

TL;DR

Your web app threat model doesn’t work for AI. LLMs are probabilistic, instruction-following, and memorize training data. Five frameworks now exist to fill the gap: OWASP LLM Top 10 (2025), OWASP Agentic AI Top 10 (2026), MITRE ATLAS, NIST AI RMF, and CSA MAESTRO. Each covers different territory. Most organizations use none of them: only 15% have a formal AI governance policy (ISACA, 3,270 professionals, 2024), and 63% of breached organizations had no AI governance policy or were still developing one or were still developing one (IBM/Ponemon, 2025). This post maps the full picture so you know where to start and which frameworks to layer. For background on how agents actually exploit these gaps, see The privilege escalation kill chain.

A heavy vault door set into exposed drywall studs showing the gap between secure door and weak wall

Why does your web app threat model fail for AI systems?

Traditional threat models assume deterministic systems with well-defined input/output boundaries. LLMs break every one of those assumptions, and the consequences are measurable.

The inputs are infinite and unstructured. A web API accepts structured requests against a schema. You can validate inputs, enumerate edge cases, fuzz parameters. An LLM accepts any natural-language string. The attack surface is literally the English language (or any other language). You cannot enumerate or validate it. Prompt injection (OWASP LLM01) exploits this directly: the attacker writes instructions that override other instructions, using the same input channel as legitimate users.

The outputs are probabilistic. Run the same SQL query twice, you get the same result. Run the same prompt twice, you might get different outputs. This breaks reproducibility, testing, and verification. A prompt injection might succeed 70% of the time and fail 30%. You can’t test-to-eliminate a vulnerability when success is probabilistic. The HackerOne 2025 Hacker-Powered Security Report documented a 540% surge in valid prompt injection reports year-over-year, with $2.1 million paid in AI-specific bug bounties.

The feature is the vulnerability. LLMs are trained to follow instructions. That’s the product. It’s also the attack vector. Prompt injection doesn’t exploit a memory corruption bug or a logic flaw in code. It exploits the model doing exactly what it was designed to do: following instructions. A 2025 Quarkslab analysis demonstrated this with a medical AI agent: hidden instructions in HTML content caused the agent to retrieve another patient’s records using its legitimate, authorized API access. The model wasn’t broken. It was working perfectly.

Training data is the permanent attack surface. LLMs memorize content from training data, including email addresses, API keys, and SSH keys. USENIX Security 2025 research demonstrated PII extraction from production models via direct querying with zero prior knowledge of the training corpus. Unlike a SQL injection, there’s no patch: you can’t remove memorized data without retraining the model. IBM’s 2026 X-Force Threat Intelligence Index documented over 300,000 exposed ChatGPT credentials from infostealer malware during 2025.

The context window is a shared trust zone. System prompts, user messages, retrieved documents, and tool outputs all coexist in the same context window. There are no privilege rings, no memory protection, no process isolation. A malicious instruction embedded in a retrieved PDF has the same “privilege level” as the system prompt. EchoLeak (CVE-2025-32711) exploited exactly this: hidden prompts in Word documents caused Microsoft 365 Copilot to silently exfiltrate data.

13% of organizations reported breaches of AI models or applications in 2025, and 97% of those lacked proper AI access controls (IBM/Ponemon, 600 organizations, March 2024 to February 2025). The gap isn’t that teams don’t care about AI security. The gap is that they’re applying the wrong threat model.

What does the OWASP LLM Top 10 actually cover?

The OWASP LLM Top 10 (2025 edition, November 2024, developed by nearly 500 experts across 18+ countries) is the most widely referenced AI security framework. It catalogs the ten most critical application-level vulnerabilities in LLM systems.

#	Risk	What it means
LLM01	Prompt Injection	Malicious inputs override model instructions (direct or indirect)
LLM02	Sensitive Information Disclosure	PII, credentials, or IP leak through model outputs
LLM03	Supply Chain	Compromised models, datasets, adapters, or plugins
LLM04	Data and Model Poisoning	Manipulated training data embeds backdoors or biases
LLM05	Improper Output Handling	Unsanitized outputs passed to downstream systems
LLM06	Excessive Agency	Agents with overly broad permissions executing unintended actions
LLM07	System Prompt Leakage	Extraction of credentials and logic from system prompts
LLM08	Vector and Embedding Weaknesses	RAG pipeline flaws enabling cross-tenant data leakage
LLM09	Misinformation	Hallucinated content treated as authoritative
LLM10	Unbounded Consumption	Resource exhaustion via unrestricted token/compute usage

The 2025 version is a significant departure from 2023. Sensitive Information Disclosure jumped from #6 to #2. Two items are entirely new: System Prompt Leakage (LLM07) and Vector and Embedding Weaknesses (LLM08), both reflecting the explosion of RAG deployments. Model Theft was dropped. The philosophical shift matters most: the 2025 edition is explicitly framed around agentic architectures, with Excessive Agency (LLM06) significantly expanded.

What OWASP LLM misses: It’s an application-level checklist. It doesn’t cover adversarial kill chains (how attackers chain techniques), system-level architecture (how to design defensible systems), or inter-agent trust (what happens when agents talk to each other). It tells you what to worry about, not how attacks actually unfold.

How does MITRE ATLAS complement OWASP?

MITRE ATLAS (Adversarial Threat Landscape for AI Systems) is the ATT&CK framework for machine learning. Where OWASP says “prompt injection is a risk,” ATLAS says “here’s how attackers actually perform prompt injection, in what sequence, with what tools, and here are 33 documented case studies of it happening in the real world.”

The catalog as of October 2025: 15 tactics, 66 techniques, 46 sub-techniques, 26 mitigations, and 33 real-world case studies. In October 2025, MITRE collaborated with Zenity Labs to add 14 new techniques specifically targeting agentic AI systems, including agent context poisoning and cross-agent memory manipulation.

The ATLAS taxonomy follows the ATT&CK pattern: Reconnaissance > Resource Development > Initial Access > Execution > Persistence > Privilege Escalation > Defense Evasion > Exfiltration. This matters because it gives red teams a structured attack playbook. OWASP LLM01 (Prompt Injection) maps to ATLAS technique AML.T0051, which further breaks into direct versus indirect injection paths with separate mitigations and documented case studies.

Where ATLAS goes deeper: It covers the full pre-training attack surface (model extraction, inversion attacks, membership inference) that OWASP largely ignores. It maps actual adversary behavior, not just vulnerability categories. It’s the framework you hand to your red team.

Where ATLAS is weaker: Less prescriptive on remediation than OWASP. It tells you what attackers do, not what developers should build differently. And despite the October 2025 agentic additions, the agent-specific coverage is still catching up to the speed of deployment.

What is NIST AI RMF and why should engineers care?

The NIST AI Risk Management Framework is the odd one out. It isn’t about specific attacks or techniques. It’s a governance scaffold with four functions:

GOVERN sets the organizational context: policies, roles, accountability, culture. This is the “always-on” function that applies across the entire AI lifecycle. In practice: who owns AI risk at your organization? What’s the escalation path when an agent does something unexpected? If you can’t answer those questions, GOVERN is where you start.

MAP identifies where risk exists: system context, intended uses, stakeholders, dependencies, potential harms. In practice: do you have an inventory of every AI model, agent, and data pipeline in production? 88% of organizations now use AI in at least one business function (McKinsey, 2025). Most cannot enumerate what’s deployed.

MEASURE evaluates how bad the risk is: monitoring, performance metrics, trustworthiness assessment. In practice: how do you know if your model’s behavior has drifted? What’s your baseline for detecting anomalous tool calls?

MANAGE decides what to do about it: prioritize, mitigate, track, monitor, including third-party and supply chain risks. In practice: when your RAG pipeline ingests a document containing adversarial instructions, what’s the incident response procedure?

The December 2025 NIST draft (IR 8596) brought the framework closer to practical security by integrating it with NIST CSF 2.0 and requiring specific controls: maintaining inventories of models, agents, APIs, and datasets; mapping end-to-end AI data flows; verifying provenance and integrity of training and input data.

What NIST AI RMF misses: It’s not technical. It provides no attack taxonomy, no technique catalog, no code examples. It’s the framework you use to build and justify a program, not to run a pen test.

What about the agentic gap?

All three frameworks above were designed for an era when AI systems were primarily inference endpoints: you send a prompt, you get a response. Agents change everything. They plan, use tools, delegate to other agents, persist state across sessions, and take actions in the real world. The attack surface isn’t just the model anymore; it’s the entire orchestration layer.

The OWASP Agentic AI Top 10 (2026, 100+ experts) addresses this directly with ten agent-specific risks. The ones that matter most:

ASI01 (Agent Goal Hijack): Attackers manipulate the planning layer, not just a single response. This is prompt injection evolved for multi-step agents.
ASI03 (Identity & Privilege Abuse): Agents inherit, escalate, and share high-privilege credentials across sessions. Exactly what we documented in Cryptographic capability binding.
ASI07 (Insecure Inter-Agent Communication): Compromised agents send spoofed instructions to peers with no mutual authentication.
ASI08 (Cascading Failures): One agent fault propagates through orchestration chains with high fan-out, no circuit breakers, and insufficient tenant isolation.
ASI10 (Rogue Agents): Agents that slowly drift toward unintended actions over time, subtle enough to evade policy checks.

The CSA MAESTRO framework (Cloud Security Alliance, February 2025) takes yet another approach: a 7-layer architecture specifically for multi-agent threat modeling, from foundation model risk (Layer 1) through agent interaction protocols (Layer 5) to ecosystem-level risks (Layer 7). MAESTRO’s insight is that agent compromise often becomes classic infrastructure compromise, and vice versa. The layers must be treated as interconnected, not independent.

How do these five frameworks fit together?

Here’s the part nobody has written yet: how these frameworks layer, where they overlap, and where the gaps are.

graph TD
    subgraph "Governance Layer"
        NIST["NIST AI RMF<br/>GOVERN | MAP | MEASURE | MANAGE"]
    end
    subgraph "Threat Intelligence Layer"
        ATLAS["MITRE ATLAS<br/>15 tactics, 66 techniques<br/>33 case studies"]
    end
    subgraph "Application Security Layer"
        OWASP_LLM["OWASP LLM Top 10<br/>Inference-time risks"]
        OWASP_AGENT["OWASP Agentic AI Top 10<br/>Agent orchestration risks"]
    end
    subgraph "Architecture Layer"
        MAESTRO["CSA MAESTRO<br/>7-layer agent architecture"]
    end

    NIST --> ATLAS
    NIST --> OWASP_LLM
    NIST --> OWASP_AGENT
    ATLAS --> OWASP_LLM
    ATLAS --> OWASP_AGENT
    MAESTRO --> OWASP_AGENT
    MAESTRO --> ATLAS

What you need	Start with	Layer on
Fix immediate LLM vulnerabilities	OWASP LLM Top 10	MITRE ATLAS for red team detail
Secure an agent deployment	OWASP Agentic AI Top 10	CSA MAESTRO for architecture
Build a security program	NIST AI RMF	OWASP + ATLAS for technical substance
Plan a red team exercise	MITRE ATLAS	OWASP for vulnerability prioritization
Threat model a multi-agent system	CSA MAESTRO	OWASP Agentic + ATLAS for techniques

Where they overlap: Prompt injection appears in OWASP LLM01, ATLAS AML.T0051, OWASP Agentic ASI01, and MAESTRO Layer 2. Each adds context the others miss. OWASP tells you it’s a risk. ATLAS shows you how attackers perform it. The Agentic list shows how it works against planners. MAESTRO shows where in the architecture to defend.

Where the gaps are: No framework fully addresses voice agent security at the protocol level (SIP injection, barge-in exploitation, DTMF manipulation). Real-time streaming introduces timing-based attacks that none of these frameworks model well. And the “AI systems doing the attacking” threat model (your agent used by an attacker as a weapon against third parties) falls between all five frameworks.

How do you build a threat model for your AI system?

Here’s the practical output. Run this exercise with your security and engineering teams in one session.

Step 1: Inventory your AI assets. What models are deployed? What data do they access? What tools can they call? What permissions do they hold? If you deploy agents, which agents can talk to which? 88% of organizations use AI but most cannot answer these questions (McKinsey, 2025).

Step 2: Draw trust boundaries. User to model. Model to tools. Agent to agent. Model to external data sources. RAG pipeline to document store. Each boundary is a potential attack surface. Pay special attention to where untrusted data (user input, retrieved documents, tool responses) crosses into trusted context (system prompt, tool execution).

Step 3: Map your assets to framework coverage. For each AI component, check: which OWASP LLM risks apply? If agentic, which Agentic AI risks apply? What ATLAS techniques could target this component? What NIST AI RMF controls should you have?

Step 4: Identify your exposed gaps. The most dangerous risks sit where no framework covers you: voice agent protocols, real-time streaming attacks, cross-provider tool composition, the “AI as weapon” scenario. Document these explicitly as accepted risks or areas requiring custom controls.

Step 5: Prioritize by blast radius. An agent with read-write access to your CRM, code repository, and cloud infrastructure is a bigger risk than a chatbot answering FAQ questions. Scope your threat model depth to the damage potential. 75% of leaders won’t let security concerns slow their AI agent deployment (Straiker). That pressure makes prioritization essential.

FAQ

What’s the difference between OWASP LLM Top 10 and the Agentic AI Top 10?

The LLM Top 10 (2025) covers inference-time application risks for any LLM system: prompt injection, data leakage, supply chain, output handling. The Agentic AI Top 10 (2026) extends threat modeling to autonomous agents that plan, use tools, delegate, and persist state. New risks include inter-agent trust (ASI07), cascading failures (ASI08), and behavioral drift (ASI10). If you’re deploying a chatbot, you need the LLM Top 10. If you’re deploying agents, you need both.

How often should I update my AI threat model?

Every time you add a new model, a new tool, a new data source, or a new agent. The AI threat surface changes with every deployment change, not on a quarterly review cycle. Gartner predicts that custom-built AI applications will drive 50% of enterprise cybersecurity incident response efforts by 2028 (March 2026). Your threat model should be a living document, not a PDF on a SharePoint site.

Can I use these frameworks for compliance (EU AI Act, SOC 2)?

NIST AI RMF maps directly to EU AI Act requirements and integrates with NIST CSF 2.0 for SOC 2-adjacent controls. The December 2025 NIST draft (IR 8596) was specifically designed to bridge this gap. OWASP provides the technical substance that auditors want to see implemented. Start with NIST for the governance structure, then show OWASP controls as evidence of technical implementation.

Is there a single framework that covers everything?

No. That’s the problem and the point of this post. OWASP is developer-focused but lacks kill chains. ATLAS is red team-focused but lacks governance guidance. NIST AI RMF is governance-focused but lacks technical depth. MAESTRO is architecture-focused but lacks OWASP-style vulnerability catalogs. You need at least two: one for governance (NIST AI RMF), one for technical implementation (OWASP LLM + Agentic). Add ATLAS when you start red teaming and MAESTRO when you design multi-agent architectures.

Key takeaways

Traditional web/API threat models don’t work for AI systems: probabilistic outputs, instruction-following as attack vector, training data memorization, and shared context windows break every assumption
Five frameworks exist: OWASP LLM Top 10 (application risks), OWASP Agentic AI Top 10 (agent risks), MITRE ATLAS (adversarial techniques), NIST AI RMF (governance), and CSA MAESTRO (multi-agent architecture)
No single framework covers everything: you need at least two, layered based on your deployment pattern
The state of adoption is grim: 63% of breached organizations had no AI governance policy or were still developing one, 97% lacked proper access controls (IBM/Ponemon, 2025)
Start with an asset inventory (what AI is deployed?), draw trust boundaries, map to framework coverage, identify gaps, and prioritize by blast radius
The biggest gap across all frameworks: voice agent protocol-level security and the “AI as weapon” threat model where your agent attacks third-party systems

Want to work together?

I take on projects, advisory roles, and fractional CTO engagements in AI/ML. I also help businesses go AI-native with agentic workflows and agent orchestration.

Get in touch

The AI threat model every security team is missing

TL;DR

Why does your web app threat model fail for AI systems?

What does the OWASP LLM Top 10 actually cover?

How does MITRE ATLAS complement OWASP?

What is NIST AI RMF and why should engineers care?

What about the agentic gap?

How do these five frameworks fit together?

How do you build a threat model for your AI system?

FAQ

What’s the difference between OWASP LLM Top 10 and the Agentic AI Top 10?

How often should I update my AI threat model?

Can I use these frameworks for compliance (EU AI Act, SOC 2)?

Is there a single framework that covers everything?

Key takeaways

Related across topics

Share on

TL;DR

Why does your web app threat model fail for AI systems?

What does the OWASP LLM Top 10 actually cover?

How does MITRE ATLAS complement OWASP?

What is NIST AI RMF and why should engineers care?

What about the agentic gap?

How do these five frameworks fit together?

How do you build a threat model for your AI system?

FAQ

What’s the difference between OWASP LLM Top 10 and the Agentic AI Top 10?

How often should I update my AI threat model?

Can I use these frameworks for compliance (EU AI Act, SOC 2)?

Is there a single framework that covers everything?

Key takeaways

Related across topics

Prompt Injection Defense

Share on