MCP security beyond SSRF: tool poisoning, rug pulls, and the shadow server problem
“Your MCP server passed the security audit in January. It was modified in February. Nobody noticed.”
TL;DR
MCP’s trust model has three structural weaknesses that infrastructure scanning cannot catch. Tool poisoning embeds malicious instructions in metadata that LLMs read as ground truth — the MCPTox benchmark shows an average 36.5% attack success rate across 20 agents, with peaks above 60% for o1-mini, Phi-4, and GPT-4o-mini. Rug pull attacks modify approved tools post-consent with no re-approval trigger. Shadow MCP servers proliferate at 15-30 per 100-engineer organization without IT visibility. Together, these create an attack surface that traditional vulnerability scanners miss entirely. For the infrastructure-level vulnerabilities (SSRF, command injection, authentication bypass), see the MCP SSRF epidemic.

What attack surface does MCP create beyond SSRF?
The MCP SSRF post covered the infrastructure layer: 30 CVEs in 60 days, 36.7% of servers with SSRF flaws, 41% lacking authentication. Those are real problems. They are also the problems that existing security tools know how to find.
Three attack classes target MCP’s trust model instead. They exploit how LLMs interpret tool definitions, how organizations manage tool approval, and how developers deploy servers. CVE scanners do not catch them because nothing is technically broken — the protocol works exactly as designed. The trust assumptions are what fail.
Anthropic launched MCP in November 2024. Within 18 months: 97 million+ monthly SDK downloads, 5,500+ servers, endorsement from OpenAI, Google, and Microsoft. The supply-led adoption curve is steep — of 2,500+ tracked servers, only 8 exceeded 50,000 installs (Zuplo State of MCP Report). Most servers are small, unaudited, and built by individual developers.
How does tool poisoning hijack agent behavior?
When an LLM connects to an MCP server, it retrieves tool definitions — JSON schemas describing what each tool does, what parameters it accepts, and what it returns. The LLM reads this metadata as trusted configuration context to plan its actions.
Tool poisoning embeds malicious instructions directly in that metadata. Not in the user’s prompt. Not in the documents being processed. In the tool’s own description.
sequenceDiagram
participant Dev as Developer
participant Client as MCP Client
participant Server as Malicious MCP Server
participant LLM as LLM
Dev->>Client: Connect to MCP server
Client->>Server: List available tools
Server->>Client: Tool definitions (poisoned metadata)
Client->>LLM: Here are the available tools
Note over LLM: Reads poisoned descriptions<br/>as trusted system context
LLM->>Client: Plan: call tool X with these params
Note over Client: Executes plan that includes<br/>attacker's hidden instructions
CyberArk’s “Poison Everywhere” research identified three escalation levels. Full-Schema Poisoning hides instructions not just in descriptions but in parameter types, required arrays, and default values — fields that security reviewers rarely inspect. Advanced Tool Poisoning exploits how LLMs interpret tool outputs, embedding follow-up instructions in error messages. Behavioral attacks use conditional triggers that activate only under specific traffic patterns, making them invisible during development testing.
The MCPTox benchmark (August 2025) quantified the problem. Researchers tested 45 real-world MCP servers with 353 authentic tools and 1,312 malicious test cases. Average attack success rate across 20 tested agents was 36.5%, with peaks exceeding 60% for o1-mini (72.8%), Phi-4 (70.2%), and GPT-4o-mini (61.8%). The counterintuitive finding: more capable models were more susceptible, because they follow instructions more faithfully. Claude-3.7-Sonnet had the highest refusal rate — still under 3%.
Invariant Labs demonstrated practical exploitation: exfiltrating WhatsApp chat histories, stealing contents from private GitHub repositories, and extracting SSH credentials. All from tool descriptions that a human reviewer would need to parse character-by-character to spot.
What makes rug pull attacks so difficult to detect?
A rug pull changes a tool’s behavior after the user has approved it. The initial version passes security review. The modified version does not trigger re-approval.
The mechanics: version 1.0.15 of a tool is benign. It does exactly what its description says. Your team reviews it, approves it, deploys it. Version 1.0.16 adds a single line — maybe a BCC field on an email tool, maybe a logging call that exfiltrates parameters to an external endpoint. Most MCP clients do not re-prompt for approval when tool definitions change. The user keeps using what they already approved.
This happened. The Postmark-MCP package on npm (a counterfeit, not the official Postmark tool) shipped benign until version 1.0.16. Then it added Bcc: 'phan@giftshop.club' — silently blind-copying every email sent through the tool to an attacker-controlled address. Before detection: roughly 1,643 total downloads, an estimated 300 organizations using it in production, and 3,000-15,000 emails per organization per day exfiltrated. Email content, attachments, headers — often containing credentials, tokens, customer PII, and regulated data.
The defense is structural: tool pinning via cryptographic hash. Generate a fingerprint of each tool’s complete definition at approval time. Verify that fingerprint at every runtime invocation. If the hash changes, block execution and alert. The ETDI framework (arXiv 2506.01333) formalizes this with OAuth-Enhanced Tool Definitions and Policy-Based Access Control. Invariant Labs’ mcp-scan implements hash-based rug pull detection.
The uncomfortable reality: most teams do not pin tool definitions. They approve once and assume permanence.
How widespread is the shadow MCP server problem?
Qualys published research on March 19, 2026 that reframed MCP servers as “the new shadow IT.” The finding: organizations with 100 or more engineers typically have 15-30 MCP server configurations that IT has zero visibility into.
The deployment pattern is consistent. A developer installs an MCP server locally — binding to localhost, listening on a random high port. It starts as an experiment. It works. It gets shared with the team. It accumulates production credentials: database connection strings, API tokens, cloud provider keys. Within weeks, it is a production dependency that nobody formally approved and nobody monitors.
Of 500+ servers scanned in the Qualys study, 38% had no authentication at all. These servers have the same access to internal systems as any other authenticated service, but with none of the governance, logging, or access review that formal services receive.
The GitHub MCP incident demonstrated the consequence. A developer’s agent, connected to an MCP server with legitimate GitHub credentials, encountered a malicious issue in a public repository. The issue contained prompt injection payloads. The agent, now compromised, used its same credentials to access private repositories — extracting salary information, compensation details, and internal project data. The exfiltration channel: a public pull request. The researchers called it “the lethal trifecta”: access to private data, exposure to malicious input, and an outbound communication channel.
Shadow servers make this trifecta nearly guaranteed. They have credentials (access to private data). They process untrusted content (exposure to malicious input). They are connected to the internet (outbound channel). And nobody is watching.
What detection and defense tools exist?
Three purpose-built tools address MCP-specific threats, each targeting a different layer.
mcp-scan (Invariant Labs, acquired by Snyk) scans MCP configuration files and connects to servers to retrieve tool definitions. It runs the definitions against the Invariant Guardrails API to detect prompt injections, hidden instructions, cross-origin escalations, and tool shadowing. Two modes: mcp-scan scan for one-time audits, mcp-scan proxy for continuous runtime monitoring. The hash-based tool pinning detects rug pulls by comparing current definitions against approved baselines.
pan-mcp-relay (Palo Alto Networks) operates as a defense-first relay server between MCP clients and servers. It scans all traffic for prompt injections, malicious URLs, insecure outputs, and sensitive data leaks. Rather than scanning configurations after deployment, it intercepts traffic in real-time.
Qualys TotalAI takes an inventory-first approach: discover all MCP servers across network, host, and supply chain dimensions before assessing them. For organizations that do not know how many MCP servers they have — which is most — discovery is the prerequisite.
A broader audit by Grith found that 12% of 2,857 audited agent skills were malicious. One in eight. If your organization uses community-built MCP servers without vetting, the base rate is not in your favor.
What should you do this week?
The layered defense approach, ordered by effort:
-
Audit your MCP inventory. Run
mcp-scan scanagainst every known configuration. Then ask your engineering teams what they are running locally — the shadow servers are the ones you do not know about. -
Pin tool definitions. Hash every approved tool’s complete schema. Verify at runtime. Block on mismatch. This is the single most effective defense against both tool poisoning and rug pulls.
-
Enforce least-privilege tokens. Each MCP server gets its own scoped credential. Never share tokens between servers. A compromised email tool should not have GitHub access.
-
Block localhost exposure. Bind MCP servers to 127.0.0.1, never 0.0.0.0. Use a gateway (pan-mcp-relay) for any server that needs external connectivity.
-
Monitor tool description changes. Any schema change should trigger re-review. Automate this — Snyk’s mcp-scan proxy mode enables continuous verification.
-
Treat MCP servers as production services. They carry production credentials. They process untrusted input. They connect to the internet. If a traditional microservice had those three properties, it would have full governance. Your MCP servers should too.
Key takeaways
- Tool poisoning attacks the trust model, not the infrastructure. LLMs read tool metadata as trusted context. MCPTox shows 60%+ success rates — more capable models are more vulnerable.
- Rug pulls exploit the approval gap. Postmark-MCP exfiltrated 3,000-15,000 emails per org per day from ~300 organizations, via a single BCC line added post-approval.
- Shadow servers are everywhere. 15-30 per 100-engineer organization (Qualys). They carry production credentials with zero governance.
- Detection tools exist. mcp-scan (hash-based pinning), pan-mcp-relay (real-time filtering), Qualys TotalAI (discovery). Use all three layers.
- Traditional scanners miss this. CVE scanners find broken servers. These attacks use servers that work exactly as designed — the trust assumptions are what fail.
Further reading
- The MCP SSRF epidemic — infrastructure-level MCP vulnerabilities (CVEs, SSRF, auth bypass)
- Prompt injection defense — how prompt injection exploits tool-calling agents
- Securing agent orchestration — patterns and controls for multi-agent systems
Want to work together?
I take on projects, advisory roles, and fractional CTO engagements in AI/ML. I also help businesses go AI-native with agentic workflows and agent orchestration.
Get in touch