9 minute read

“The API key was in the system prompt. The system prompt was in the response. The response was in the attacker’s hands.”

TL;DR

Secrets enter LLM context through five paths: system prompts, RAG ingestion, tool responses, agent memory, and config files. Once in context, they’re one prompt injection from exfiltration. Devin leaked Jira and Slack tokens to attacker-controlled websites. Claude Code exfiltrated API keys via DNS (CVE-2025-55284). OWASP added System Prompt Leakage as LLM07 in 2025. The fix is architectural: keep credentials out of context windows entirely. For how prompt injection enables this exfiltration, see Indirect prompt injection.


A server chassis with security panel removed, hardware tokens scattered across the exposed interior

How do secrets enter LLM context?

Five paths, and most LLM applications use at least three of them.

Path 1: Hardcoded in system prompts. Developers embed API keys, database connection strings, internal URLs, and authentication tokens directly in system prompts. It’s convenient: the model needs these credentials to call tools, so put them where the model can see them. The problem: system prompts are extractable. OWASP added System Prompt Leakage (LLM07) to the 2025 Top 10 specifically because this pattern is so common and so dangerous.

Path 2: RAG pipeline ingestion. Vector databases ingest documents from across the organization. Some of those documents contain credentials: deployment runbooks with database passwords, API documentation with example keys, internal wikis with service account credentials, configuration files accidentally committed to repositories. Once a credential lands in the vector store, any RAG query that retrieves that chunk puts the credential in the LLM’s context.

Path 3: Tool responses. The model calls an API. The API response includes authentication headers, session tokens, or credentials in the response body. The full response enters the context window. If the model is under prompt injection, those credentials are now accessible to the attacker’s instructions.

Path 4: Agent memory. Long-running agents persist conversation history across sessions. If a previous conversation involved credentials (a user pasting an API key, a tool returning a token), those credentials live in the agent’s memory. Subsequent prompt injection can retrieve them from memory even if the current conversation doesn’t involve credentials.

Path 5: Configuration files. Coding agents read project files. Environment variables in .env files, service account keys in config directories, and secrets in CI/CD configuration files all become context when the agent reads the project. Claude Code’s printenv access was one of the exfiltration vectors documented by Rehberger.

graph LR
    A[System Prompt<br/>Hardcoded API keys] --> CTX[LLM Context Window]
    B[RAG Pipeline<br/>Indexed credentials] --> CTX
    C[Tool Responses<br/>Tokens in API output] --> CTX
    D[Agent Memory<br/>Historical credentials] --> CTX
    E[Config Files<br/>.env, service keys] --> CTX

    CTX --> F{Prompt Injection?}
    F -->|Yes| G[Exfiltration<br/>DNS, URLs, API calls]
    F -->|No| H[Normal Operation]

    style G fill:#fce4ec

What happened with Devin and Claude Code?

Two case studies that demonstrate the full attack chain from credential presence to exfiltration.

Devin AI (April 2025). Johann Rehberger spent $500 on a Devin subscription and documented the results. A poisoned GitHub issue contained indirect prompt injection that directed Devin to an attacker-controlled website. Devin, operating autonomously, followed the instructions embedded in the issue. Using its shell access, Devin ran curl and wget commands to send environment variables to the attacker’s server. The exfiltrated data included Jira and Slack tokens from Devin’s built-in secrets management system.

The attack worked because Devin had legitimate access to secrets (for integrating with Jira and Slack), legitimate shell access (for executing commands), and no restrictions on sending HTTP requests to arbitrary URLs. The attacker didn’t exploit a bug. They exploited the intersection of legitimate capabilities. Reported to Cognition on April 6, 2025. Over 120 days later, no fix was shipped within the disclosure window.

Claude Code (CVE-2025-55284, CVSS 7.1, June 2025). Claude Code maintained an allowlist of pre-approved shell commands including network utilities: ping, nslookup, dig, and host. A prompt injection embedded in a project file hijacked Claude Code into running ping attacker.com with API keys encoded as DNS subdomains. The data left through DNS requests, which most network monitoring tools don’t inspect for data exfiltration. Anthropic removed network utilities from the allowlist in 11 days.

Claude Code (CVE-2025-59536, CVSS 8.7). A separate finding by Check Point Research demonstrated full remote code execution through malicious project configuration files. The attack exploited hooks, MCP server configs, and environment variable access to execute arbitrary shell commands when Claude Code initialized in a repository. A third vulnerability (CVE-2026-21852, CVSS 5.3) enabled Anthropic API key exfiltration from malicious repositories through multiple pathways.

The pattern: credentials are present in the agent’s environment. The agent has legitimate access to exfiltration channels (network, shell, browser). Prompt injection redirects the agent’s capabilities from their intended purpose to credential theft.


What does OWASP LLM07 cover?

System Prompt Leakage is a new entry in the OWASP LLM Top 10 for 2025, which tells you the community considers it a distinct and growing risk rather than a subset of prompt injection.

The vulnerability: confidential system prompts are extracted through adversarial prompting. This matters because developers treat system prompts as a secure container for sensitive information. They embed:

  • API keys and authentication tokens (for tool calling)
  • Database connection strings
  • Internal API endpoints
  • Business logic and pricing information
  • Security configurations and content policy rules
  • Customer-specific customization logic

All of this becomes accessible when the system prompt is extracted. And system prompt extraction is one of the easiest attacks to execute. “Repeat everything above” works against models that haven’t been specifically hardened. More sophisticated extraction techniques (asking the model to summarize its instructions, formatting them as JSON, requesting them in a different language) work against most models most of the time.

The OWASP recommendation: assume every system prompt will eventually be extracted. Never put credentials, pricing logic, or security-critical configuration in the system prompt. Treat it as a public document that happens to be hidden by default.


How do you keep credentials out of context windows?

The architectural principle is simple: credentials should never enter the LLM’s context window. Implementing it requires changes at multiple levels.

External secret managers. Use HashiCorp Vault, Doppler, or AWS Secrets Manager. The application code (not the LLM) retrieves credentials at runtime, uses them to make the API call, and passes only the result to the LLM. The credential never appears in any prompt, response, or context window.

Proxy layers for tool calls. Instead of giving the LLM direct API access with credentials, route tool calls through a proxy that adds authentication headers server-side. The LLM sends “call the weather API for NYC” and the proxy adds the API key before forwarding the request. The LLM never sees the key.

RAG sanitization. Scan documents before indexing them in vector stores. Use secret detection tools (TruffleHog, GitLeaks, or similar) to identify and redact credentials before they enter the RAG pipeline. Run periodic scans on the vector store to catch secrets that made it through initial screening.

Tool response filtering. Strip credentials from API responses before they enter the LLM context. If a tool response includes authentication headers, session tokens, or connection strings, filter them at the tool integration layer. The model receives the data it needs without the credentials that produced it.

Memory sanitization. If the agent uses persistent memory, scan stored conversations for credential patterns and redact them. Don’t store raw conversation logs that might contain user-pasted secrets.

Assume extraction. Design as if system prompt extraction, RAG poisoning, and prompt injection will all succeed. The question isn’t “can the attacker get into the context window?” (they can). The question is “when they do, what credentials will they find?” The answer should be “none.”


Key takeaways

  • Secrets enter LLM context through five paths: system prompts, RAG ingestion, tool responses, agent memory, and configuration files
  • OWASP added System Prompt Leakage (LLM07) to the 2025 Top 10. Assume every system prompt will be extracted.
  • Devin leaked Jira/Slack tokens through prompt injection. Claude Code exfiltrated API keys via DNS (CVE-2025-55284) and had full RCE via project configs (CVE-2025-59536, CVSS 8.7)
  • The pattern is consistent: credentials present + exfiltration channel available + prompt injection = credential theft
  • Keep credentials out of context windows entirely. Use external secret managers, proxy layers for tool calls, RAG sanitization, and tool response filtering
  • Design for extraction. The question is not IF the context window is compromised but WHAT the attacker finds when it is.

FAQ

How do secrets get into LLM context?

Five paths: hardcoded in system prompts, ingested through RAG pipelines (sensitive documents in vector stores), returned in tool responses, stored in agent memory (historical conversations), and loaded from configuration files (.env, service keys). Most LLM applications use at least three of these paths.

What is OWASP LLM07?

System Prompt Leakage, new in 2025. Covers the extraction of confidential system prompts through adversarial prompting. Significant because developers embed API keys, business logic, and security configurations in system prompts, treating them as secure when they’re extractable.

How did Devin AI leak credentials?

A poisoned GitHub issue directed Devin to an attacker-controlled website via prompt injection. Devin used shell commands to send environment variables (Jira and Slack tokens from its secrets manager) to the attacker’s server. Reported April 2025, 120+ days without a fix.

How should credentials be managed in LLM applications?

Never put them in context windows. Use external secret managers with runtime-only retrieval. Route tool calls through proxy layers that add authentication server-side. Sanitize RAG pipelines and tool responses. Scan agent memory for credential patterns. Assume extraction will succeed.

Want to work together?

I take on projects, advisory roles, and fractional CTO engagements in AI/ML. I also help businesses go AI-native with agentic workflows and agent orchestration.

Get in touch