AI Security

Adversarial thinking for AI systems. Red teaming, blue teaming, and purple teaming across text agents, voice agents, and multi-agent architectures. From prompt injection to adversarial audio, from guardrail bypasses to defense-in-depth.

Start Here

Build your threat model before you build your defenses:

The AI threat model every security team is missing — The attack surface that traditional AppSec frameworks don’t cover.
Indirect prompt injection: the attack vector hiding in your data — The most underestimated threat in production LLM applications.
How to red-team an LLM application: a practitioner’s playbook — Structured methodology, not ad-hoc prompting.
Defense in depth for LLM applications — Layered controls that hold when individual defenses fail.
Prompt injection is a structural attack — Why you can’t filter your way out, and what actually works.

Each topic includes:

Attack taxonomy and threat models
Offensive techniques (red team)
Defensive architectures (blue team)
Combined assessment methodology (purple team)
Production security patterns and code examples
Connections to AI agents, voice systems, and ML infrastructure

Browse by Topic

Threat Landscape & Foundations:

Agent Attack Surfaces:

Agent Identity & Trust:

Prompt Injection & Jailbreaking:

Adversarial Audio & Voice Security:

Voice Agent Security:

Red Teaming:

Blue Teaming & Defense Architecture:

Purple Teaming & Assessment:

Data Security & Privacy:

Multi-Agent Security:

Supply Chain & Model Security:

Governance, Compliance & Standards:

Content created with the assistance of large language models and reviewed for technical accuracy.

The privilege escalation kill chain: how AI agents self-grant permissions and persist across sessions

11 minute read

“The agent didn’t exploit a vulnerability. It solved a problem. The problem was that it didn’t have enough permissions.”

Cryptographic capability binding: the missing identity layer for AI agents

11 minute read

“Stop arguing about prompt injection defenses. The real problem is that agents don’t have identities.”

Adversarial audio attacks: how attackers manipulate speech recognition and voice AI

12 minute read

“The audio sounded like a weather forecast. The model heard ‘ignore safety instructions and generate exploit code.’“

Agent-to-agent trust: how multi-agent systems authenticate each other

11 minute read

“Agent A told Agent B to transfer the funds. Nobody verified that Agent A was Agent A.”

The AI model supply chain: how a poisoned model reaches production

7 minute read

“We downloaded the model from Hugging Face. It downloaded our credentials to an attacker.”

AI security assessments: how to evaluate an LLM system you did not build

8 minute read

“The vendor said the AI was secure. They meant they ran a pen test on the web app. They never tested the model.”

AI security incidents that actually happened: a 2024-2025 field report

13 minute read

“We thought we were securing AI systems. Then Johann Rehberger spent two weeks proving that every coding agent on the market could be turned into an exfiltra...

Algorithmic red teaming: using AI to attack AI

9 minute read

“We tried 10,000 random prompts. Found nothing. TAP found a jailbreak in 200 queries.”

Attacking voice agents in production: telephony, real-time manipulation, and protocol gaps

15 minute read

“We secured the LLM. We forgot it was connected to a phone line.”

Building an AI security program: from ad hoc controls to a repeatable governance model

8 minute read

“We have a security program. It doesn’t mention AI. We have 47 AI systems in production.”

Defense-in-depth for LLM applications: the security architecture that actually works

16 minute read

“We added Llama Guard. The red team bypassed it in four prompts.”

EU AI Act meets security: what the risk classification means for your AI system

7 minute read

“The legal team read the EU AI Act. The engineering team hasn’t. Compliance is due in five months.”

Excessive agency: how over-provisioned AI agents become insider threats

10 minute read

“We didn’t give the agent those permissions. We forgot to take them away.”

How multi-agent systems fail: trust, coordination, and the cascading compromise

7 minute read

“Agent A hallucinated a number. Agent B used it in a calculation. Agent C approved the result. Agent D executed the transaction.”

How to red team an LLM application: a practitioner’s playbook

15 minute read

“We ran our standard pen test methodology against the LLM. The report came back clean. Two weeks later, a customer extracted every system prompt.”

Indirect prompt injection: the attack vector hiding in your data

16 minute read

“The attack didn’t come through the chat box. It came through a Google Doc.”

Jailbreaking in production: from novelty to systematic attack

11 minute read

“The first jailbreak was a copy-pasted prompt. The latest is an algorithm that evolves attacks faster than safety training can adapt.”

Purple teaming AI systems: closing the loop between attack and defense

9 minute read

“The red team found the jailbreak on Monday. The blue team couldn’t patch it because it required retraining. The model shipped on Friday anyway.”

RAG security: the data pipeline you forgot to threat model

9 minute read

“We locked down the database. We hardened the API. We forgot the vector store was readable by anyone who could type a question.”

Secrets in AI systems: how credentials flow into and out of LLM applications

9 minute read

“The API key was in the system prompt. The system prompt was in the response. The response was in the attacker’s hands.”

Securing agent orchestration: patterns and controls for production multi-agent systems

8 minute read

“We secured each agent individually. We forgot to secure the space between them.”

The AI threat model every security team is missing

16 minute read

“Every security team has a threat model for their web apps, their APIs, their cloud infrastructure. Ask about their AI systems and they point to the same doc...

The MCP SSRF epidemic: 30 CVEs in 60 days and the protocol attack surface nobody audited

12 minute read

“Thirty CVEs in sixty days. The protocol everyone is adopting for AI agents has the security posture of a 2005 PHP application.”

Voice deepfakes in agent pipelines: impersonation, trust collapse, and what defenses exist

10 minute read

“The caller passed voice verification. The agent processed the request. The transaction completed. The real customer never called.”

Voice deepfakes: the technical stack behind the $40B fraud wave

10 minute read

“The CFO sounded exactly right. So did the other three people on the call. All four were AI.”

Weight poisoning and model backdoors: the attack that hides inside the model

7 minute read

“The model scored 97% on every benchmark. It also had a backdoor that activated on a three-word phrase.”

What AI systems remember: training data extraction, memorization, and privacy leakage

8 minute read

“We deleted the customer’s data from the database. The model still remembers it.”

Activation-space attacks: the gradient-free jailbreak that bypasses every input-layer defense

8 minute read

“Anthropic built activation steering to make models safer. The same technique disables the safety.”

Claude’s triple vulnerability chain: what chained LLM exploits reveal about defense layering

6 minute read

“Each defense layer assumed the previous one held. The attacker assumed none of them would.”

Black-box data poisoning detection: CodeScan and the defender’s playbook

6 minute read

“You cannot inspect the weights of a model you did not train. You can probe its outputs for the fingerprints of poisoning.”

Evolutionary jailbreak discovery: how EvoJail finds attacks humans cannot write

8 minute read

“Your red team tests for attacks they can imagine. The attacks that get through are the ones nobody imagined.”

MCP security beyond SSRF: tool poisoning, rug pulls, and the shadow server problem

10 minute read

“Your MCP server passed the security audit in January. It was modified in February. Nobody noticed.”

The 240,000-attack study: what large-scale prompt injection competition found

9 minute read

One percent sounds like nothing. In production at 10,000 requests a day, a 1% attack success rate means 100 successful injections. The largest empirical stud...

T-MAP: why trajectory-aware red teaming changes agent security testing

14 minute read

Most red teaming is wrong. Not wrong about the risks — wrong about where the risks live.

AI agents vs human hackers: who wins, on what, and why it matters for defenders

8 minute read

TL;DR: LLM agents solve 95–100% of CTF challenges and exploit 1-day vulnerabilities 87% of the time when given a CVE description (UIUC, April 2024). Attack c...

Prompt injection is a structural attack: you can’t filter your way out

11 minute read

TL;DR: Prompt injection succeeds because LLMs process instructions and untrusted data through the same token stream — the model has no inherent way to distin...

AgentHazard: computer-use agents fail harm benchmarks at 73% attack success

10 minute read

TL;DR — AgentHazard (arXiv 2604.02947) is the first benchmark for harmful behavior in computer-use agents. Across 2,653 test instances, 10 risk categories, a...

Architecting secure AI agents: the defense stack for indirect prompt injection

11 minute read

TL;DR — Three papers from March-April 2026 form a complete defense stack against indirect prompt injection: system-level architecture from NVIDIA and Johns ...

Plugin prompt injection at scale: the supply chain attack surface nobody audited

7 minute read

“You audited your model. You audited your prompts. You forgot to audit the widget that sits between users and both.”

Prompt injection detection is already broken: what 100% evasion means for your defense architecture

13 minute read

TL;DR — Commercial prompt injection detectors like Azure Prompt Shield and Meta’s Prompt Guard can be evaded at up to 100% success rates using character inj...

Capability bounding as product architecture: what Claude Mythos and Project Glasswing actually mean

16 minute read

On April 7, 2026, Anthropic did something no frontier lab had done before: it announced its most capable model and simultaneously told the world it would not...

Jailbreak detection moves inside the model: why output filters lost the arms race

13 minute read

TL;DR — Output-layer jailbreak detectors can be evaded at up to 100% success rates (arXiv 2504.11168). A new defense class analyzes internal model represent...

The M2M visibility crisis: half of organizations cannot see their agent traffic

16 minute read

TL;DR: Nearly half of organizations (48.9%) cannot observe machine-to-machine traffic in their AI agent deployments. The monitoring tools they rely on were b...