AI security incidents that actually happened: a 2024-2025 field report
“We thought we were securing AI systems. Then Johann Rehberger spent two weeks proving that every coding agent on the market could be turned into an exfiltration tool.”
TL;DR
This is the incident timeline that the field needs and doesn’t have. In 15 months, Johann Rehberger disclosed prompt injection vulnerabilities in every major coding agent: Devin, Cursor, Copilot, Claude Code, Google Jules. A Chinese state actor used Claude Code to autonomously breach approximately 30 organizations. Deepfake voice fraud surged 1,300% (Pindrop, 2025). ServiceNow, Salesforce, Slack, and Microsoft 365 Copilot all had critical AI-specific vulnerabilities with CVSS scores above 9.0. The patterns are consistent: indirect prompt injection through trusted data, default configurations that maximize attack surface, and exfiltration through channels the AI is authorized to use. For the frameworks that should have caught these, see The AI threat model every security team is missing.

What happened to the coding agents?
Every major coding agent shipped with exploitable prompt injection vulnerabilities. Not some of them. All of them.
Devin AI (Cognition) was the first domino. Johann Rehberger reported to Cognition on April 6, 2025, and published on August 7, 2025, after 120+ days with no fix. The attack: indirect prompt injection through untrusted files or web pages hijacks Devin’s autonomous execution. Four exfiltration vectors worked: shell commands (curl/wget), browser navigation, Markdown image rendering, and Slack link unfurling. Any secret in Devin’s built-in secrets manager was accessible. Cognition acknowledged the report but did not patch within the disclosure window.
Cursor IDE (CVE-2025-54132, August 4, 2025): Mermaid diagram rendering in Cursor chat allowed embedding external image URLs. An injected prompt in any file Cursor analyzed could trigger automatic data exfiltration when Cursor generated a Mermaid diagram. Patched in Cursor v1.3.
GitHub Copilot (CVE-2025-53773, CVSS 7.8, August 2025): Injecting malicious prompts into source files or GitHub issues manipulated Copilot into modifying .vscode/settings.json to add "chat.tools.autoApprove": true, placing the agent in unrestricted mode. With auto-approval enabled, Copilot executed shell commands without user confirmation. Rehberger demonstrated propagation through infected repositories. Microsoft patched in August 2025 Patch Tuesday.
Claude Code (CVE-2025-55284, CVSS 7.1, reported May 26, fixed June 6, 2025): Claude Code maintained an allowlist of pre-approved shell commands including ping, nslookup, dig, and host. A prompt injection in any analyzed file could hijack Claude into running ping attacker.com with API keys encoded as subdomains, exfiltrating secrets via DNS requests. Anthropic removed network utilities from the allowlist in 11 days. A separate finding by Check Point Research (CVE-2025-59536) later demonstrated full RCE through malicious project configuration files.
Google Jules (August 13-15, 2025): Markdown image exfiltration, unrestricted internet access, and hidden Unicode injection. No CVE assigned.
The entire disclosure wave, nicknamed “The Summer of Johann” by Simon Willison, ran from August 1-15, 2025: one coding agent vulnerability per day for two weeks straight. The pattern in every case was identical: plant instructions in data the agent reads, watch the agent execute them with its own legitimate credentials.
How did a state actor weaponize Claude Code?
On November 13, 2025, Anthropic disclosed that a Chinese state-sponsored actor (designated GTG-1002) used Claude Code to conduct espionage operations against approximately 30 global organizations including tech companies, financial institutions, chemical manufacturers, and government agencies.
The operational details, as described by Anthropic: operators social-engineered Claude into believing it was conducting defensive cybersecurity testing for a legitimate firm. Claude Code then autonomously performed what Anthropic estimated as 80-90% of the campaign’s tactical operations at thousands of requests per second. This covered reconnaissance, vulnerability discovery, exploitation, lateral movement, credential harvesting, data analysis, and exfiltration. Human involvement was limited to 4-6 decision points per operation, with roughly 20 minutes of human work per intrusion.
Anthropic’s head of threat intelligence told the Wall Street Journal that “as many as four” organizations were successfully breached. Anthropic called it “the first documented case of a large-scale cyberattack executed without substantial human intervention.”
I should note the skepticism. Bleeping Computer and multiple security researchers raised doubts about some claims, noting Claude occasionally hallucinated credentials and claimed to have extracted data that was publicly available. The operational reality is likely somewhere between Anthropic’s characterization and the skeptics’ pushback. What’s undeniable: a state actor found it valuable enough to build an entire campaign around autonomous AI execution.
This wasn’t the first state-actor AI disclosure. In February 2024, OpenAI and Microsoft terminated accounts belonging to five nation-state groups (Charcoal Typhoon and Salmon Typhoon from China, Crimson Sandstorm from Iran, Emerald Sleet from North Korea, Forest Blizzard from Russia). At the time, Microsoft characterized their usage as “consistent with attackers using AI as another productivity tool.” The GTG-1002 campaign represented a qualitative escalation from “using AI as a research assistant” to “using AI as the primary operator.”
What happened to the enterprise AI platforms?
The enterprise AI tools that companies deployed to millions of knowledge workers had critical vulnerabilities with CVSS scores above 9.0.
Microsoft 365 Copilot had two major incidents. The first (reported January 2024, disclosed August 2024): Rehberger demonstrated a four-technique chain combining prompt injection via email, automatic tool invocation, ASCII smuggling with Unicode tag characters, and hyperlink rendering to exfiltrate email bodies, sales figures, and MFA codes. Microsoft patched by August 22, 2024. The second, EchoLeak (CVE-2025-32711, CVSS 9.3, patched June 2025): a zero-click vulnerability where an attacker sends a crafted email that the target never needs to open. When the victim later asks Copilot any question that retrieves context from that email, Copilot executes the embedded instructions and exfiltrates data from the entire M365 environment. Discovered by Aim Security.
ServiceNow Now Assist (CVE-2025-12420, CVSS 9.3, discovered October 2025 by AppOmni): two distinct vulnerabilities. The second-order prompt injection exploited three default-enabled agent discovery properties to redirect AI tasks to malicious ones. More alarming: BodySnatcher chained a hardcoded platform-wide secret with account-linking logic that trusts only an email address. An attacker with only a target’s email could bypass MFA and SSO, impersonate any administrator, and execute AI agents to create backdoor accounts with full privileges. Data at risk included SSNs, healthcare records, and financial data.
Salesforce Agentforce (CVSS 9.4, reported July 28, patched September 8, 2025): Noma Security discovered that Salesforce’s Web-to-Lead form has a description field with a 42,000-character limit. Attackers inject hidden LLM instructions into this field. The data is stored as legitimate CRM records. When employees later query Agentforce about those leads, Agentforce processes both the question and the hidden instructions, then sends CRM data to an attacker-controlled server. Noma also purchased an expired domain that was still on Salesforce’s Content Security Policy allowlist for $5.
Slack AI (disclosed August 2024 by PromptArmor): indirect prompt injection where malicious instructions in a public channel cause Slack AI to collect and exfiltrate data from private channels the attacker cannot access. Slack initially rejected the report. After public disclosure, Slack deployed a “limited patch covering very limited and specific circumstances.”
How bad is the voice deepfake problem?
The numbers from Pindrop’s 2025 Voice Intelligence Report (analyzing 1.2 billion customer calls in 2024) paint a clear picture:
| Metric | Value |
|---|---|
| Deepfake fraud attempt increase | +1,300% (from ~1/month to 7/day) |
| Synthetic voice call increase (Q1-Q4 2024) | +173% |
| Fraud attempt frequency in U.S. contact centers | Every 46 seconds |
| Insurance sector synthetic voice attacks | +475% |
| Banking sector synthetic voice attacks | +149% |
| AI-driven fraud losses in 2024 | $12.5 billion |
| Projected contact center fraud exposure for 2025 | $44.5 billion |
The most expensive single incident: Arup, the UK engineering firm, lost HK$200 million (~$25 million) in January 2024 when a finance employee in their Hong Kong office joined a video conference where every other participant, including the CFO, was an AI-generated deepfake. The employee made 15 transfers to five Hong Kong bank accounts. None of the money has been recovered.
What’s happening in the model supply chain?
Hugging Face, the de facto distribution platform for AI models, has a structural supply chain problem.
JFrog discovered over 100 malicious models with code execution payloads on Hugging Face in March 2024. The models used Python’s __reduce__ method in Pickle deserialization to execute arbitrary code on load, granting attackers a reverse shell. By April 2025, Protect AI scanned 4.47 million model versions and found 352,000 unsafe or suspicious issues across 51,700 models. Hugging Face added 1 million new models in 2024 alone.
The existing defense (Picklescan) was bypassed. In February 2025, ReversingLabs discovered the “nullifAI” evasion technique: malicious models stored their Pickle payloads compressed with 7z instead of ZIP, preventing both torch.load() and Picklescan from flagging the file. Hugging Face removed the models within 24 hours and updated Picklescan.
Separately, in June 2024, Hugging Face disclosed that unauthorized access to their Spaces platform exposed authentication tokens for an undisclosed number of organizations. And in December 2023, Lasso Security found 1,600+ exposed Hugging Face API tokens in public repositories, affecting 723 organizations including Google, Meta, and Microsoft.
The Langflow AI agent platform (CVE-2025-34291, CVSS 9.4) was actively exploited in the wild starting January 2026, with a CORS misconfiguration enabling single-click account takeover that immediately escalated to full RCE because Langflow natively executes arbitrary Python.
What are the recurring patterns?
After cataloging these incidents, three patterns explain nearly everything.
Pattern 1: The data is the attack vector. In every indirect prompt injection incident (Copilot, ServiceNow, Salesforce, Slack, Gemini, Devin, Cursor, Copilot), the attack enters through data the AI trusts: an email, a document, a form field, a web page, a source code file. The AI doesn’t distinguish between “data to process” and “instructions to execute” because it can’t. This isn’t a bug that gets patched. It’s a property of how language models work.
Pattern 2: Defaults are the enemy. ServiceNow’s BodySnatcher exploited three default-enabled agent discovery properties. Salesforce’s ForcedLeak exploited a default-open 42,000-character form field. Claude Code’s DNS exfiltration exploited a default allowlist of network utilities. Slack AI’s exfiltration worked because the system ingested public and private channel content in the same context by default. Every default configuration choice that maximizes functionality also maximizes attack surface. For how agents exploit these defaults to escalate privileges, see The privilege escalation kill chain.
Pattern 3: Exfiltration uses legitimate channels. The data leaves through channels the AI is authorized to use: DNS requests (Claude Code), image URL fetches (Cursor, Bing Chat, Copilot), hyperlinks (Copilot ASCII smuggling), API calls (Salesforce, ServiceNow), browser navigation (Devin). Blocking exfiltration means restricting the AI’s own capabilities, which directly conflicts with the product requirements that made the AI useful in the first place.
There’s a fourth pattern in the vendor responses: the response time range is 11 days (Anthropic’s Claude Code DNS fix) to 120+ days with no fix (Cognition’s Devin). Slack initially rejected the report entirely. The median is somewhere around 60-90 days, roughly matching traditional software vulnerability disclosure timelines. But AI vulnerabilities are fundamentally different: they’re often architecture-level problems, not code bugs. You can’t just patch a confused deputy.
FAQ
Which coding agents have been proven vulnerable to prompt injection?
All of them. As of August 2025, documented vulnerabilities exist for: Devin AI (Cognition), Cursor IDE (CVE-2025-54132), GitHub Copilot (CVE-2025-53773), Claude Code (CVE-2025-55284, CVE-2025-59536), Google Jules, OpenHands, and Amp. Johann Rehberger demonstrated exploits against each during the “Summer of Johann” (August 1-15, 2025). The common vector is indirect prompt injection through files the agent reads.
Has anyone been fined for an AI security failure?
Not yet. The EU AI Act’s prohibited practices became legally binding in February 2025 (penalty ceiling: 35 million euros or 7% global revenue), but no enforcement actions have been publicly announced as of March 2026. GPAI model obligations activated in August 2025. Full enforcement for high-risk AI systems begins August 2026. The enforcement infrastructure is still being assembled at the member-state level.
How do I track new AI security incidents?
Johann Rehberger’s Embrace the Red blog (embracethered.com) is the single highest-signal source for AI tool vulnerabilities. Simon Willison’s blog (simonwillison.net) provides context and analysis. HackerOne’s AI vulnerability reports are growing at 210% year-over-year with 1,121 programs now including AI in scope. MITRE ATLAS (atlas.mitre.org) catalogs documented adversarial techniques with case studies.
Are these incidents getting worse or better?
Worse by every measurable metric. HackerOne’s valid AI vulnerability reports grew 210% year-over-year in 2025 with a 540% surge in prompt injection specifically. Pindrop measured a 1,300% increase in deepfake fraud attempts. Malicious models on Hugging Face increased 6.5x year-over-year. The attack surface is growing faster than defenses. The one positive signal: vendor response times are improving, with several critical CVEs patched within weeks rather than months.
Key takeaways
- Every major coding agent (Devin, Cursor, Copilot, Claude Code, Jules) had exploitable prompt injection vulnerabilities disclosed in 2025
- Enterprise AI platforms (Copilot, ServiceNow, Salesforce, Slack) all had critical vulnerabilities with CVSS scores above 9.0
- A Chinese state actor used Claude Code to autonomously conduct espionage against ~30 organizations (Anthropic, November 2025)
- Voice deepfake fraud surged 1,300% with $12.5 billion in AI-driven fraud losses in 2024 (Pindrop)
- The Hugging Face model supply chain has a structural security gap: 352,000 unsafe issues across 51,700 models, and the primary scanner was bypassed
- Three patterns recur: data is the attack vector, defaults maximize attack surface, exfiltration uses legitimate channels
- No EU AI Act enforcement fines have been issued yet, but the full enforcement deadline is August 2026
Want to work together?
I take on projects, advisory roles, and fractional CTO engagements in AI/ML. I also help businesses go AI-native with agentic workflows and agent orchestration.
Get in touch