What is MCP tool poisoning?

Tool poisoning is an attack where malicious instructions are embedded in an MCP tool's metadata — descriptions, parameter types, default values, and output schemas. When an LLM retrieves tool definitions to plan its actions, it reads the entire poisoned metadata as trusted context. The MCPTox benchmark measured 60%+ attack success rates across major LLMs including GPT-4o-mini, DeepSeek-R1, and Phi-4. More capable models are actually more susceptible because they follow instructions more faithfully.

What is an MCP rug pull attack?

A rug pull attack modifies a previously approved MCP tool's behavior after the user has granted consent. The tool passes initial security review with benign functionality, then silently changes its description, data access patterns, or permission requirements. Most MCP clients do not re-prompt for approval when tool definitions change, so the user continues trusting a now-malicious tool. Defense requires cryptographic tool pinning and hash verification at runtime.

How many shadow MCP servers does a typical organization have?

According to Qualys research from March 2026, organizations with 100 or more engineers typically have 15-30 MCP server configurations that IT has no visibility into. These servers often bind to localhost on random ports, carry production credentials, and start as developer experiments that become production dependencies without formal approval.

How does MCP tool poisoning differ from prompt injection?

Prompt injection targets the LLM's input at runtime — a malicious string in a document or message tricks the model into following hidden instructions. Tool poisoning targets the infrastructure layer — malicious instructions live in the tool's own metadata, which the LLM treats as system-level configuration rather than user input. This makes tool poisoning harder to detect because the attack surface is the tool registry itself, not any particular user interaction.

What tools can detect MCP security issues?

Three tools address MCP-specific threats. Invariant Labs' mcp-scan (acquired by Snyk) scans MCP configurations for prompt injections, tool poisoning, and rug pulls via hash verification. Palo Alto Networks' pan-mcp-relay acts as a defense-first relay that blocks malicious URLs, prompt injections, and sensitive data leaks in MCP traffic. Qualys TotalAI provides layered discovery and assessment of MCP servers across network, host, and supply chain dimensions.

MCP security beyond SSRF: tool poisoning, rug pulls, and the shadow server problem

10 minute read

“Your MCP server passed the security audit in January. It was modified in February. Nobody noticed.”

TL;DR

MCP’s trust model has three structural weaknesses that infrastructure scanning cannot catch. Tool poisoning embeds malicious instructions in metadata that LLMs read as ground truth — the MCPTox benchmark shows an average 36.5% attack success rate across 20 agents, with peaks above 60% for o1-mini, Phi-4, and GPT-4o-mini. Rug pull attacks modify approved tools post-consent with no re-approval trigger. Shadow MCP servers proliferate at 15-30 per 100-engineer organization without IT visibility. Together, these create an attack surface that traditional vulnerability scanners miss entirely. For the infrastructure-level vulnerabilities (SSRF, command injection, authentication bypass), see the MCP SSRF epidemic.

A server rack component that has been replaced with a near-identical counterfeit unit, the substitution visible only under harsh inspection lightin...

What attack surface does MCP create beyond SSRF?

The MCP SSRF post covered the infrastructure layer: 30 CVEs in 60 days, 36.7% of servers with SSRF flaws, 41% lacking authentication. Those are real problems. They are also the problems that existing security tools know how to find.

Three attack classes target MCP’s trust model instead. They exploit how LLMs interpret tool definitions, how organizations manage tool approval, and how developers deploy servers. CVE scanners do not catch them because nothing is technically broken — the protocol works exactly as designed. The trust assumptions are what fail.

Anthropic launched MCP in November 2024. Within 18 months: 97 million+ monthly SDK downloads, 5,500+ servers, endorsement from OpenAI, Google, and Microsoft. The supply-led adoption curve is steep — of 2,500+ tracked servers, only 8 exceeded 50,000 installs (Zuplo State of MCP Report). Most servers are small, unaudited, and built by individual developers.

How does tool poisoning hijack agent behavior?

When an LLM connects to an MCP server, it retrieves tool definitions — JSON schemas describing what each tool does, what parameters it accepts, and what it returns. The LLM reads this metadata as trusted configuration context to plan its actions.

Tool poisoning embeds malicious instructions directly in that metadata. Not in the user’s prompt. Not in the documents being processed. In the tool’s own description.

sequenceDiagram
    participant Dev as Developer
    participant Client as MCP Client
    participant Server as Malicious MCP Server
    participant LLM as LLM

    Dev->>Client: Connect to MCP server
    Client->>Server: List available tools
    Server->>Client: Tool definitions (poisoned metadata)
    Client->>LLM: Here are the available tools
    Note over LLM: Reads poisoned descriptions<br/>as trusted system context
    LLM->>Client: Plan: call tool X with these params
    Note over Client: Executes plan that includes<br/>attacker's hidden instructions

CyberArk’s “Poison Everywhere” research identified three escalation levels. Full-Schema Poisoning hides instructions not just in descriptions but in parameter types, required arrays, and default values — fields that security reviewers rarely inspect. Advanced Tool Poisoning exploits how LLMs interpret tool outputs, embedding follow-up instructions in error messages. Behavioral attacks use conditional triggers that activate only under specific traffic patterns, making them invisible during development testing.

The MCPTox benchmark (August 2025) quantified the problem. Researchers tested 45 real-world MCP servers with 353 authentic tools and 1,312 malicious test cases. Average attack success rate across 20 tested agents was 36.5%, with peaks exceeding 60% for o1-mini (72.8%), Phi-4 (70.2%), and GPT-4o-mini (61.8%). The counterintuitive finding: more capable models were more susceptible, because they follow instructions more faithfully. Claude-3.7-Sonnet had the highest refusal rate — still under 3%.

Invariant Labs demonstrated practical exploitation: exfiltrating WhatsApp chat histories, stealing contents from private GitHub repositories, and extracting SSH credentials. All from tool descriptions that a human reviewer would need to parse character-by-character to spot.

What makes rug pull attacks so difficult to detect?

A rug pull changes a tool’s behavior after the user has approved it. The initial version passes security review. The modified version does not trigger re-approval.

The mechanics: version 1.0.15 of a tool is benign. It does exactly what its description says. Your team reviews it, approves it, deploys it. Version 1.0.16 adds a single line — maybe a BCC field on an email tool, maybe a logging call that exfiltrates parameters to an external endpoint. Most MCP clients do not re-prompt for approval when tool definitions change. The user keeps using what they already approved.

This happened. The Postmark-MCP package on npm (a counterfeit, not the official Postmark tool) shipped benign until version 1.0.16. Then it added Bcc: 'phan@giftshop.club' — silently blind-copying every email sent through the tool to an attacker-controlled address. Before detection: roughly 1,643 total downloads, an estimated 300 organizations using it in production, and 3,000-15,000 emails per organization per day exfiltrated. Email content, attachments, headers — often containing credentials, tokens, customer PII, and regulated data.

The defense is structural: tool pinning via cryptographic hash. Generate a fingerprint of each tool’s complete definition at approval time. Verify that fingerprint at every runtime invocation. If the hash changes, block execution and alert. The ETDI framework (arXiv 2506.01333) formalizes this with OAuth-Enhanced Tool Definitions and Policy-Based Access Control. Invariant Labs’ mcp-scan implements hash-based rug pull detection.

The uncomfortable reality: most teams do not pin tool definitions. They approve once and assume permanence.

How widespread is the shadow MCP server problem?

Qualys published research on March 19, 2026 that reframed MCP servers as “the new shadow IT.” The finding: organizations with 100 or more engineers typically have 15-30 MCP server configurations that IT has zero visibility into.

The deployment pattern is consistent. A developer installs an MCP server locally — binding to localhost, listening on a random high port. It starts as an experiment. It works. It gets shared with the team. It accumulates production credentials: database connection strings, API tokens, cloud provider keys. Within weeks, it is a production dependency that nobody formally approved and nobody monitors.

Of 500+ servers scanned in the Qualys study, 38% had no authentication at all. These servers have the same access to internal systems as any other authenticated service, but with none of the governance, logging, or access review that formal services receive.

The GitHub MCP incident demonstrated the consequence. A developer’s agent, connected to an MCP server with legitimate GitHub credentials, encountered a malicious issue in a public repository. The issue contained prompt injection payloads. The agent, now compromised, used its same credentials to access private repositories — extracting salary information, compensation details, and internal project data. The exfiltration channel: a public pull request. The researchers called it “the lethal trifecta”: access to private data, exposure to malicious input, and an outbound communication channel.

Shadow servers make this trifecta nearly guaranteed. They have credentials (access to private data). They process untrusted content (exposure to malicious input). They are connected to the internet (outbound channel). And nobody is watching.

What detection and defense tools exist?

Three purpose-built tools address MCP-specific threats, each targeting a different layer.

mcp-scan (Invariant Labs, acquired by Snyk) scans MCP configuration files and connects to servers to retrieve tool definitions. It runs the definitions against the Invariant Guardrails API to detect prompt injections, hidden instructions, cross-origin escalations, and tool shadowing. Two modes: mcp-scan scan for one-time audits, mcp-scan proxy for continuous runtime monitoring. The hash-based tool pinning detects rug pulls by comparing current definitions against approved baselines.

pan-mcp-relay (Palo Alto Networks) operates as a defense-first relay server between MCP clients and servers. It scans all traffic for prompt injections, malicious URLs, insecure outputs, and sensitive data leaks. Rather than scanning configurations after deployment, it intercepts traffic in real-time.

Qualys TotalAI takes an inventory-first approach: discover all MCP servers across network, host, and supply chain dimensions before assessing them. For organizations that do not know how many MCP servers they have — which is most — discovery is the prerequisite.

A broader audit by Grith found that 12% of 2,857 audited agent skills were malicious. One in eight. If your organization uses community-built MCP servers without vetting, the base rate is not in your favor.

What should you do this week?

The layered defense approach, ordered by effort:

Audit your MCP inventory. Run mcp-scan scan against every known configuration. Then ask your engineering teams what they are running locally — the shadow servers are the ones you do not know about.
Pin tool definitions. Hash every approved tool’s complete schema. Verify at runtime. Block on mismatch. This is the single most effective defense against both tool poisoning and rug pulls.
Enforce least-privilege tokens. Each MCP server gets its own scoped credential. Never share tokens between servers. A compromised email tool should not have GitHub access.
Block localhost exposure. Bind MCP servers to 127.0.0.1, never 0.0.0.0. Use a gateway (pan-mcp-relay) for any server that needs external connectivity.
Monitor tool description changes. Any schema change should trigger re-review. Automate this — Snyk’s mcp-scan proxy mode enables continuous verification.
Treat MCP servers as production services. They carry production credentials. They process untrusted input. They connect to the internet. If a traditional microservice had those three properties, it would have full governance. Your MCP servers should too.

Key takeaways

Tool poisoning attacks the trust model, not the infrastructure. LLMs read tool metadata as trusted context. MCPTox shows 60%+ success rates — more capable models are more vulnerable.
Rug pulls exploit the approval gap. Postmark-MCP exfiltrated 3,000-15,000 emails per org per day from ~300 organizations, via a single BCC line added post-approval.
Shadow servers are everywhere. 15-30 per 100-engineer organization (Qualys). They carry production credentials with zero governance.
Detection tools exist. mcp-scan (hash-based pinning), pan-mcp-relay (real-time filtering), Qualys TotalAI (discovery). Use all three layers.
Traditional scanners miss this. CVE scanners find broken servers. These attacks use servers that work exactly as designed — the trust assumptions are what fail.

MCP security beyond SSRF: tool poisoning, rug pulls, and the shadow server problem

TL;DR

What attack surface does MCP create beyond SSRF?

How does tool poisoning hijack agent behavior?

What makes rug pull attacks so difficult to detect?

How widespread is the shadow MCP server problem?

What detection and defense tools exist?

What should you do this week?

Key takeaways

Further reading

Related across topics

Share on

TL;DR

What attack surface does MCP create beyond SSRF?

How does tool poisoning hijack agent behavior?

What makes rug pull attacks so difficult to detect?

How widespread is the shadow MCP server problem?

What detection and defense tools exist?

What should you do this week?

Key takeaways

Further reading

Related across topics

Prompt Injection Defense

Share on