MCP in production: what 97 million monthly downloads actually looks like

Q: What are the three main MCP production archetypes?

API wrapper (stateless tool exposure), workflow orchestrator (multi-step sequential chains), and stateful agent backend (session-aware agents with persistent context). Each has a different failure profile and requires different infrastructure decisions.

Q: What is the biggest production failure mode in MCP deployments?

Silent failures rank highest. Without health check endpoints, an MCP server can appear green while returning empty or wrong responses. One team ran 60+ failed API calls over 48 hours before catching it manually. The protocol doesn't standardize observability, so you have to build it yourself.

Q: What is the Context-Aware Broker Protocol and do I need it?

CABP is a six-stage broker layer proposed in arXiv:2603.13417 that handles identity propagation MCP doesn't natively support. You need some version of it the moment you have multiple users or tenants. Without it, you're passing user context as request parameters — which is easy to omit and impossible to audit consistently.

17 minute read

97 million monthly SDK downloads. 10,000+ active servers. MCP is infrastructure now — not a feature, not an integration pattern, infrastructure. The question is no longer whether to use it. It’s how to structure it.

Most writing about MCP is still in tutorial mode: here’s what it is, here’s a hello world, here’s how to connect Claude to your database. That material served its purpose. But the teams who went to production first are staring at a different set of questions: how do you propagate user identity across a tool chain? What breaks when an upstream API goes down and the agent doesn’t know? How do you handle 500ms latency creep before customers notice?

A March 2026 paper — arXiv:2603.13417, “Bridging Protocol and Production: Design Patterns for Deploying AI Agents with Model Context Protocol” — documents exactly these problems from a real enterprise deployment. This post translates that paper and the broader production ecosystem into a practitioner guide. Three archetypes, the failure modes that don’t make the demos, and why GitHub’s first-party server changes the integration calculation for developer tooling.

TL;DR: MCP hit 97M monthly SDK downloads and 10,000+ active servers in 16 months. The tutorials got you to hello world. This post covers what breaks in production: the three deployment archetypes teams are converging on, the failure modes that don’t make the demos, and why GitHub’s first-party MCP server changes the economics of developer tool integration.

A structured cable management panel with dozens of color-coded patch cables running from a central switch to labeled server modules, each cable col...

What the 10,000-server moment actually means

The number that matters isn’t 97 million downloads. It’s that all five major AI providers — Anthropic, OpenAI, Google DeepMind, Microsoft, AWS — now support MCP natively. That shift changed MCP from “the protocol Anthropic uses” into a standard the entire industry is coordinating around.

React took three years to hit 100 million monthly npm downloads. MCP hit comparable scale in 16 months, growing 4,750% from roughly 2 million downloads per month at its November 2024 launch. Agent builders needed a common interface layer badly enough that the moment a workable option with broad platform buy-in appeared, they ran at it.

The ecosystem breakdown tells you where production deployments are concentrating: developer tools lead with 1,200+ servers, followed by business applications (950+), web and search (600+), and AI automation (450+). Those categories map directly to where agents need external context most urgently.

The Linux Foundation custody in December 2025 was the final signal. MCP is now governed like HTTP or OAuth, not like a vendor SDK. That governance change made enterprise procurement conversations much simpler. You’re not adopting Anthropic’s protocol. You’re adopting a neutral open standard.

At 10,000 servers, coverage is no longer the problem. Most business integration scenarios have an existing server to start from. The question has shifted from “does an MCP server exist for this?” to “which one is production-grade?”

Three production archetypes

The community has converged on three deployment patterns. They look similar on the surface — all involve MCP servers, all expose tools to agents — but they have different failure profiles and different infrastructure requirements.

┌─────────────────────────────────────────────────────────────────────┐
│                    THREE MCP PRODUCTION ARCHETYPES                  │
├──────────────────┬──────────────────────┬───────────────────────────┤
│  API WRAPPER     │ WORKFLOW ORCHESTRATOR│  STATEFUL AGENT BACKEND   │
├──────────────────┼──────────────────────┼───────────────────────────┤
│ Stateless        │ Stateless tools,     │  Session-aware, persistent │
│ tool exposure    │ stateful chains      │  context across turns      │
├──────────────────┼──────────────────────┼───────────────────────────┤
│ 1 tool per call  │ 3-8 sequential calls │  N calls over N turns      │
├──────────────────┼──────────────────────┼───────────────────────────┤
│ What breaks:     │ What breaks:         │  What breaks:              │
│ Silent failures  │ Timeout cascades     │  Session drift             │
│ Context bleed    │ Retry storms         │  State corruption          │
│ No observability │ No error semantics   │  Load balancer fights      │
├──────────────────┼──────────────────────┼───────────────────────────┤
│ Use when:        │ Use when:            │  Use when:                 │
│ Simple reads/    │ Business process     │  Multi-turn agents,        │
│ writes, 1 system │ automation, ETL      │  long-running tasks        │
└──────────────────┴──────────────────────┴───────────────────────────┘

MCP as API wrapper

This is where most teams start. You have an existing API — internal or third-party — and you wrap it in an MCP server so agents can call it. Each tool call is independent, the server holds no session state, and the agent decides what to call and when.

At small scale this works well. The failure mode that kills teams in production is invisibility. MCP doesn’t standardize health checks or observability. When a network policy blocks your server’s outbound traffic, the agent still receives responses — they’re just empty. The paper documents a two-day silent failure: the agent confidently told users they had no cloud projects, because the server couldn’t reach the upstream API and returned empty arrays instead of errors. Monitoring showed green. No alerts fired.

Build /health and /ready endpoints into every MCP server before you ship. /health checks liveness; /ready checks whether the server can actually reach its upstreams. Instrument per-request metrics — tool name, latency, status, error code, output size — from day one. The paper’s taxonomy (p50/p95/p99 per tool, success rates, per-tenant rates) is the right starting schema.

The second failure mode for this archetype is context bleed. When multiple users route through a shared server, their identities need to travel with each request. MCP’s JSON-RPC protocol has no native mechanism for this. Teams default to passing user context as request parameters, which works until it doesn’t — parameters can be omitted and can’t be audited consistently. The paper proposes a Context-Aware Broker Protocol (CABP): a six-stage broker layer that extracts JWT claims, validates them, and injects identity into every request before the server sees it. You need some version of this the moment you have more than one user.

Use it for reading or writing to a single external system where calls are independent — search APIs, database lookups, CRM reads, file storage. Keep it stateless, add observability, and put the broker layer in before you have multi-tenant needs, not after.

MCP as workflow orchestrator

The step up from API wrapper is orchestration: a chain of sequential tool calls where the output of one feeds the input of the next. The paper’s reference deployment is a cloud resource management agent running FetchResources → FetchServices → FetchUsageLimits → CreateLimitRequest, with a ~1,550ms total turn budget.

Two production failure modes that tutorials never cover.

The first is timeout cascades. Static timeout allocation fails because tools have different latency distributions. If you budget 400ms per tool for a four-tool chain, one slow database call wipes out the budget for everything downstream. The paper’s Adaptive Timeout Budget Allocation (ATBA) algorithm fixes this by allocating proportionally to each tool’s p99 latency — faster tools get tighter budgets, variance-heavy tools get more headroom, with a 10% reserve for planner overhead. Surplus time from fast tools redistributes to remaining calls. The paper estimates this cuts chain timeout failures by 40% or more compared to uniform allocation.

The second is the retry storm. When an upstream API returns an error, the agent retries — often three times in rapid succession. When the upstream recovers, all three retries succeed, and you get triplicate side effects. I’ve seen this create duplicate records, duplicate payments, and duplicate API resource creation. The fix is structured error semantics: the tool response needs to tell the agent whether an error is retryable, when to retry, and what alternatives to try. A generic “Internal server error” string leaves the agent guessing. A structured response with retryable: false and suggested_action: "escalate to user" makes the recovery deterministic.

Business process automation where multiple systems need to be touched in sequence — onboarding flows, ETL pipelines, approval workflows. Budget timeouts proportionally, design every tool response with structured error semantics, and test failure injection before you ship.

MCP as stateful agent backend

This is the hardest pattern to run well. Stateful agents maintain session context across multiple turns. Each session runs in a dedicated execution context, the server persists state between interactions using an Mcp-Session-Id header, and the agent can reference earlier steps in the conversation.

This is where MCP’s protocol gaps hit hardest. Stateful sessions fight with load balancers by design — session affinity requirements mean horizontal scaling needs workarounds (sticky sessions, external state stores, or microVM isolation). AWS Bedrock AgentCore Runtime added native stateful MCP server support in March 2026 specifically because this pattern was too painful to implement manually at scale.

The production architecture: connect your MCP server to an external state store (Postgres and Redis are the community defaults) rather than holding state in-process. Session context persists in the store; the server is stateless at the process level. This lets you scale server instances horizontally without session affinity requirements.

The MCP 2026 roadmap acknowledges the gap directly: evolving the transport and session model so servers can scale horizontally without holding state is listed as a top priority. The current approach works but requires deliberate engineering effort; the roadmap suggests it will become a first-class protocol feature eventually.

For operations that exceed 10 seconds — file processing, batch API calls, anything with variable completion time — use MCP Tasks instead of synchronous calls. Synchronous calls block the turn budget. Tasks let the agent poll for completion asynchronously.

sequenceDiagram
    participant User
    participant Agent
    participant Broker
    participant MCPServer
    participant StateStore
    participant UpstreamAPI

    User->>Agent: Turn 1: "start task"
    Agent->>Broker: tool_call + JWT
    Broker->>Broker: validate JWT, inject identity
    Broker->>MCPServer: tool_call + broker_context
    MCPServer->>StateStore: read session state
    MCPServer->>UpstreamAPI: API call
    UpstreamAPI-->>MCPServer: response
    MCPServer->>StateStore: write session state
    MCPServer-->>Broker: structured response
    Broker->>Broker: audit log emission
    Broker-->>Agent: response
    Agent-->>User: Turn 1 result

    User->>Agent: Turn 2: "continue task"
    Agent->>Broker: tool_call + JWT + Mcp-Session-Id
    Broker->>MCPServer: tool_call + broker_context + session
    MCPServer->>StateStore: read session state (has Turn 1 context)
    MCPServer->>UpstreamAPI: API call with prior context
    UpstreamAPI-->>MCPServer: response
    MCPServer->>StateStore: write updated state
    MCPServer-->>Agent: response
    Agent-->>User: Turn 2 result (context-aware)

Multi-turn agents that need to remember earlier steps, long-running task automation, any flow where the agent’s later actions depend on its earlier ones. Budget real engineering effort for state management, build session isolation into your security model from the start, and use an external state store. In-process state doesn’t survive horizontal scaling.

What failure modes look like at scale

The paper documents three concrete failure vignettes; the broader community has added a fourth.

The phantom tool. An agent consistently ignores a tool that works perfectly. In the paper’s deployment, a tool named get_usage_info with description “Returns usage information” was reliably skipped. The agent planner selects tools based on name and description alone — the code behind it is invisible. Renaming to FetchUsageLimits with an expanded four-sentence description (what it does, when to call it, what it returns, side effects) fixed it immediately. No code changes required. This sounds minor. It’s often the first production incident teams hit, and it makes a point that holds up across every deployment: tool descriptions are more important than tool code.

The silent egress failure. A network policy change, a firewall rule update, a dependency going down — any of these can cause your MCP server to return empty or incorrect responses while appearing fully healthy to monitoring. The paper’s deployment ran silently broken for two days, confidently telling users they had no resources. Without /health endpoints that verify upstream connectivity, there is no signal. You can’t alert on a 200 status code that returns empty data.

The retry storm. Documented above in the orchestrator section: an upstream failure causes the agent to retry aggressively. When the upstream recovers, every retry succeeds. For write operations with side effects, that’s a data integrity problem. Structured error semantics with a retryable boolean is the fix, but it requires every tool author in your ecosystem to implement it consistently. That’s a coordination problem, not purely an engineering one.

The context window explosion. At 10,000+ servers, the average production server exposes many tools. One team documented a GitHub MCP server with 90+ tools. Every MCP tool call serializes the full tool schema into the context window. Expose 90 tools, and you’re spending significant context on schema overhead before a single token of agent reasoning happens. Configure toolset filtering aggressively. Only expose the groups your agent actually needs. The GitHub MCP Server’s toolset model (repos, issues, pull_requests, actions, code_security as separate groups) exists precisely for this.

The monitoring failure underneath all four patterns is the same: teams instrument at the server level rather than the tool level. You need p99 latency per individual tool, not per server. A server averaging 200ms can hide a single tool consistently running at 800ms, silently consuming your chain’s entire budget.

The GitHub MCP server changes the calculation for developer tooling

GitHub’s official MCP server has 28,300+ stars as of March 2026. Before it existed, connecting an agent to GitHub meant building OAuth flows, implementing GitHub’s REST API, handling pagination and rate limiting, and maintaining the integration as GitHub’s API evolved. Now you configure a server and describe what you want.

The January 2026 update added three things that matter for production use: OAuth scope filtering (the server automatically hides tools your token doesn’t have permission to use), GitHub Enterprise Server support with HTTP mode, and an Insiders mode for experimental features. The scope filtering directly addresses the context window explosion problem. Rather than exposing 90+ tools and requiring you to filter manually, the server filters to what your credentials actually permit.

The part people are sleeping on is the maintenance contract. A third-party GitHub MCP integration is your problem to maintain when GitHub changes an API. GitHub’s official server is GitHub’s problem. When Anthropic, GitHub, Google, and Microsoft ship official servers for their own platforms, community-built alternatives need to offer something the official version doesn’t — specific use cases, different auth patterns, extended toolsets. Generic coverage doesn’t hold up long.

For teams building on developer platforms: audit your existing MCP servers against official alternatives. The 500+ MCP clients across Claude, ChatGPT, Cursor, VS Code, and Replit create network effects that make official servers safer long-term bets than community forks. The GitHub server’s integration with GitHub Copilot across JetBrains, Visual Studio, Eclipse, and the Copilot CLI signals where it’s headed: the canonical agentic interface to GitHub across every developer environment, not just Claude Desktop.

FAQ

What are the three main MCP production archetypes?

API wrapper (stateless tool exposure for reads/writes to a single system), workflow orchestrator (multi-step sequential chains where output feeds input), and stateful agent backend (session-aware agents with persistent context across turns). Each has a different failure profile. Choose based on whether your agent needs state, not on what’s easiest to set up.

What is the biggest production failure mode in MCP deployments?

Silent failures. Without health check endpoints that verify upstream connectivity, an MCP server can appear green while returning empty or wrong responses. One documented team ran 60+ failed API calls over 48 hours before catching it manually. The protocol doesn’t standardize observability, so you have to build it yourself: /health, /ready, and per-tool metrics from day one.

Does the GitHub MCP Server expose too many tools for production use?

Out of the box, yes — 90+ tools will bloat every context window. The move is toolset filtering: configure only the groups your agent actually needs (repos, issues, pull_requests, actions, code_security). GitHub’s January 2026 OAuth scope filtering helps by hiding tools your token lacks permission to use. For most developer tool agents, two or three toolsets is the right number.

What is the Context-Aware Broker Protocol and do I need it?

CABP (proposed in arXiv:2603.13417) is a six-stage broker layer that handles identity propagation MCP doesn’t natively support: JWT extraction, claim validation, ACL resolution, context injection, response sanitization, and audit logging. You need some version of it the moment you have multiple users or tenants. Without it, you’re passing user context as request parameters — easy to omit, impossible to audit consistently, and invisible to security tooling.

When should I use MCP Tasks instead of synchronous tool calls?

Any operation that regularly exceeds 10 seconds. Synchronous MCP calls block the agent’s turn budget for the full duration of the call. MCP Tasks let the agent hand off the operation and poll for completion asynchronously. Use Tasks for file processing, batch API operations, or any step with variable and potentially long completion times.

The paper behind this post: Vasundra Srinivasan, “Bridging Protocol and Production: Design Patterns for Deploying AI Agents with Model Context Protocol,” arXiv:2603.13417 (March 2026). 23 pages, 5 figures. The field lessons section is worth reading in full if you’re architecting a production MCP deployment.

Want to work together?

I take on projects, advisory roles, and fractional CTO engagements in AI/ML. I also help businesses go AI-native with agentic workflows and agent orchestration.

Get in touch

MCP in production: what 97 million monthly downloads actually looks like

What the 10,000-server moment actually means

Three production archetypes

MCP as API wrapper

MCP as workflow orchestrator

MCP as stateful agent backend

What failure modes look like at scale

The GitHub MCP server changes the calculation for developer tooling

FAQ

Related across topics

Share on

What the 10,000-server moment actually means

Three production archetypes

MCP as API wrapper

MCP as workflow orchestrator

MCP as stateful agent backend

What failure modes look like at scale

The GitHub MCP server changes the calculation for developer tooling

FAQ

Related across topics

Tool Design Principles & Agentic Orchestration

Share on