How is this different from the A2A vs MCP comparison?

A2A and MCP are communication protocols — they define how agents talk to each other and to tools. The SDK wars are about orchestration — how agents are defined, composed, and coordinated. You can use MCP for tool integration inside any of these SDKs. A2A can bridge agents across SDKs at the protocol level. But the orchestration architecture is SDK-specific.

What is the biggest lock-in risk with these SDKs?

State management and handoff patterns. Each SDK has its own approach to conversation state, tool results, and inter-agent communication. Migrating state management from one SDK to another requires rearchitecting your agent interactions, not just changing API calls. Tool definitions (via MCP) are more portable than orchestration logic.

The multi-agent SDK wars: OpenAI, Google, Anthropic, and Microsoft ship three incompatible paradigms

Q: Which multi-agent SDK should I pick in 2026?

It depends on your orchestration philosophy. If you want event-driven reactive agents, consider OpenAI's Agents SDK. If you want graph-based workflow orchestration, consider Google's ADK or LangGraph. If you want code-first orchestrator-as-code with minimal framework overhead, consider Anthropic's Agent SDK. The decision is an architectural commitment — switching later requires rearchitecting, not just swapping libraries.

Q: Are the multi-agent SDKs interoperable?

No. Each SDK has its own agent definition format, communication protocol, state management approach, and tool integration pattern. An agent built with OpenAI's SDK cannot be orchestrated by Google's ADK. A2A and MCP provide protocol-level bridges but do not make the SDKs themselves interoperable.

10 minute read

Four incompatible electrical plug types trying to connect to one power strip, representing SDK interoperability failure

TL;DR — OpenAI, Google, Anthropic, and Microsoft shipped agent orchestration SDKs within 90 days. They are not interoperable and bet on different paradigms: event-driven, graph-based, and orchestrator-as-code. The A2A vs MCP post covered protocol-level interoperability. This covers SDK-level architectural lock-in — the decision that shapes your agent system for years.

Four SDKs in 90 days, zero interoperability

Between January and April 2026, every major AI lab shipped a multi-agent orchestration SDK:

OpenAI Agents SDK — event-driven, reactive agent coordination
Google Agent Development Kit (ADK) — graph-based workflow orchestration with Scion testbed
Anthropic Agent SDK — code-first orchestrator-as-code with minimal framework
Microsoft Agent Framework — enterprise-grade graph orchestration with Azure integration

Each SDK defines its own agent format, communication protocol, state management approach, and tool integration pattern. An agent built with OpenAI’s SDK cannot be orchestrated by Google’s ADK. The SDKs do not share a common agent definition, a common state format, or a common handoff protocol.

This is not the protocol layer — A2A and MCP address that. A2A bridges agent-to-agent communication across frameworks. MCP standardizes tool integration. But the orchestration architecture — how agents are defined, composed, coordinated, and supervised — is SDK-specific. Picking an SDK is picking a paradigm.

The three paradigms

The four SDKs collapse into three architectural paradigms. Understanding the paradigm matters more than understanding the SDK, because the paradigm determines how you think about agent coordination.

Paradigm 1: Event-driven (OpenAI)

OpenAI’s Agents SDK treats agents as reactive event handlers. An agent receives an event (user message, tool result, another agent’s output), processes it, and emits new events. Coordination is emergent — agents react to each other’s outputs rather than following a predefined plan.

Mental model: Microservices architecture. Each agent is a service that processes messages and produces messages. No central coordinator. Coordination happens through the event stream.

Strengths: Natural fit for chatbots, customer service, and conversational flows where the interaction path is not predictable. Low framework overhead. Easy to add new agents without changing the coordination logic.

Weaknesses: Hard to enforce sequential workflows. Difficult to reason about system behavior when coordination is emergent. Debugging multi-agent interactions requires tracing event chains across agents.

Paradigm 2: Graph-based (Google ADK, Microsoft)

Google’s ADK and Microsoft’s Agent Framework model agent coordination as a directed graph. Nodes are agents or processing steps. Edges define the flow of data and control. The graph is defined declaratively and executed by a runtime engine.

Mental model: Apache Airflow or LangGraph. Define the workflow as a DAG (directed acyclic graph), and the runtime handles execution, retries, and state management.

Strengths: Explicit control flow makes debugging straightforward. Natural fit for structured workflows (document processing, data pipelines, approval chains). Easy to visualize and reason about. Built-in support for parallel execution and join points.

Weaknesses: Rigid for conversational flows where the path depends on agent reasoning. Graph changes require redefining the workflow, not just adding an agent. Higher framework overhead.

Paradigm 3: Orchestrator-as-code (Anthropic)

Anthropic’s Agent SDK takes a minimal approach: agents are Python functions, orchestration is Python code. No graph definition language. No event bus. You write the coordination logic in the same language you write everything else.

Mental model: Regular software engineering. Agents are functions. Orchestration is function composition. State is whatever you put in variables.

Strengths: Maximum flexibility. No framework-imposed constraints on coordination patterns. Easy to test (it is just code). Minimal learning curve for engineers who already write Python.

Weaknesses: No built-in visualization of agent workflows. State management is your responsibility. No framework support for common patterns (retries, parallelism, checkpointing) — you implement them yourself or pull in libraries.

The comparison matrix

Dimension	OpenAI (Event-driven)	Google/Microsoft (Graph)	Anthropic (Code)
Agent definition	Event handlers with tool specs	Graph nodes with typed inputs/outputs	Python functions
Coordination	Emergent via event stream	Declarative graph edges	Imperative Python code
State management	Framework-managed conversation state	Graph-managed execution state	Developer-managed variables
Tool integration	Native + MCP	Native + MCP	MCP-first
Debugging	Event trace analysis	Graph execution visualization	Standard debugger (pdb, breakpoints)
Best fit	Conversational, reactive workflows	Structured pipelines, approval chains	Custom logic, research, prototyping
Lock-in surface	Event schemas, handler patterns	Graph definitions, node interfaces	Low — it is just Python code
Enterprise features	API-level auth, usage tracking	Azure/GCP integration, IAM, logging	Minimal — bring your own infrastructure

The lock-in surface is the row that matters for long-term decisions. OpenAI and Google/Microsoft lock you into their respective coordination models. Anthropic’s code-first approach has the smallest lock-in surface because the orchestration logic is portable Python — but it also means you build more infrastructure yourself.

What the 50-tool problem reveals

Google’s Scion multi-agent testbed (announced April 2026) exposed a practical scaling issue that affects all three paradigms differently: tool loading overhead.

When an agent has access to 50+ tools, the tool descriptions alone consume approximately 55,000 tokens. This is not a theoretical concern — production agents in enterprise settings routinely need access to CRM, email, calendar, database, search, file management, and domain-specific tools.

The three paradigms handle this differently:

Event-driven (OpenAI): Each agent carries its full tool set. In a multi-agent system where specialized agents handle different tool domains, the total token overhead is manageable because each agent only loads its tools. But handoffs between agents must carry the relevant tool context.

Graph-based (Google/Microsoft): Tool loading is per-node. The graph runtime can optimize by only loading tools needed for each node’s execution step. This is the most efficient approach for structured workflows where tool needs are predictable.

Code-first (Anthropic): Tool loading is explicit in code. You decide when to load which tools. Maximum control but no automatic optimization — if you load all 50 tools for every agent call, you pay the token cost.

graph TD
    subgraph "50-Tool Problem"
        A[50 tools = ~55K tokens per agent]
    end
    
    subgraph "Event-driven solution"
        B[Specialize agents by tool domain]
        B --> C[Agent 1: CRM tools - 5K tokens]
        B --> D[Agent 2: Email tools - 4K tokens]
        B --> E[Agent 3: DB tools - 6K tokens]
    end
    
    subgraph "Graph-based solution"
        F[Load tools per graph node]
        F --> G[Node 1: loads only tools it needs]
        F --> H[Node 2: loads only tools it needs]
    end
    
    subgraph "Code-first solution"
        I[Explicit tool loading in Python]
        I --> J[Developer decides per-call tool set]
    end

The interoperability question

Can agents built with different SDKs work together? The short answer: at the protocol level (A2A), yes — eventually. At the orchestration level, no.

A2A (Agent-to-Agent protocol) defines a standard for agents to discover each other, negotiate capabilities, and exchange messages. An OpenAI event-driven agent can technically communicate with a Google graph-based agent via A2A. But the communication is limited to message passing — the orchestration logic within each SDK remains isolated.

MCP (Model Context Protocol) provides better interoperability for tools. A tool server exposed via MCP works with any SDK that supports MCP. This means your tool integrations are more portable than your orchestration logic — a meaningful distinction when evaluating lock-in risk.

The practical implication: build your tool integrations on MCP (portable), and accept that your orchestration logic will be SDK-specific (locked in). If you choose to switch SDKs later, your tools migrate easily. Your coordination logic does not.

The decision framework

Choose event-driven (OpenAI) if:

Your primary use case is conversational agents (chatbots, assistants, customer service)
Interaction paths are unpredictable and depend on agent reasoning
You want low framework overhead and the flexibility to add agents organically
You are comfortable debugging through event traces rather than execution graphs

Choose graph-based (Google/Microsoft) if:

Your primary use case is structured workflows (document processing, approval chains, data pipelines)
Interaction paths are mostly predictable and can be defined declaratively
You need enterprise features (IAM, audit logging, GCP/Azure integration)
You want visual debugging and explicit control flow

Choose code-first (Anthropic) if:

Your orchestration needs are custom and do not fit standard patterns
You want maximum flexibility with minimal framework constraints
Your team prefers standard Python debugging over framework-specific tooling
You are building research prototypes or rapidly iterating on agent architectures
You accept the tradeoff of building more infrastructure yourself

No clear winner exists. The right choice depends on your orchestration needs, team skills, and infrastructure preferences. The wrong choice is picking one without understanding the lock-in surface.

Key takeaways

Four major AI labs shipped agent orchestration SDKs within 90 days (Q1 2026). They are not interoperable and bet on different paradigms
The three paradigms are: event-driven (OpenAI), graph-based (Google/Microsoft), and orchestrator-as-code (Anthropic)
The lock-in surface varies: graph definitions and event schemas lock you in; code-first has the smallest lock-in but requires more infrastructure
A2A provides protocol-level bridges for inter-agent communication but does not make the SDKs themselves interoperable
MCP standardizes tool integration across all SDKs — build tools on MCP for portability
The 50-tool problem (55K tokens for full tool sets) affects each paradigm differently; graph-based orchestration handles it most efficiently through per-node tool loading
Pick the paradigm that matches your primary use case, not the SDK with the best marketing

FAQ

Which multi-agent SDK should I pick in 2026? Match the paradigm to your use case. Event-driven (OpenAI) for conversational flows. Graph-based (Google/Microsoft) for structured workflows. Code-first (Anthropic) for custom logic and research. The paradigm determines how you think about coordination — switching later means rearchitecting.

Are the SDKs interoperable? No. A2A bridges agent-to-agent communication at the protocol level, and MCP standardizes tool integration. But the orchestration logic (how agents are defined, composed, coordinated) is SDK-specific and not portable between frameworks.

What is the biggest lock-in risk? State management and handoff patterns. Each SDK manages conversation state, tool results, and inter-agent communication differently. Migrating this requires rearchitecting interactions. Tool definitions (via MCP) are the most portable component.

Can I use LangGraph or CrewAI instead? Yes. LangGraph follows the graph-based paradigm. CrewAI follows a role-based variant of event-driven orchestration. Both are viable alternatives to the first-party SDKs, with the advantage of not being locked to a single model provider. The paradigm choice still applies.

The multi-agent SDK wars: OpenAI, Google, Anthropic, and Microsoft ship three incompatible paradigms

Four SDKs in 90 days, zero interoperability

The three paradigms

Paradigm 1: Event-driven (OpenAI)

Paradigm 2: Graph-based (Google ADK, Microsoft)

Paradigm 3: Orchestrator-as-code (Anthropic)

The comparison matrix

What the 50-tool problem reveals

The interoperability question

The decision framework

Key takeaways

FAQ

Further reading

Related across topics

Share on

Four SDKs in 90 days, zero interoperability

The three paradigms

Paradigm 1: Event-driven (OpenAI)

Paradigm 2: Graph-based (Google ADK, Microsoft)

Paradigm 3: Orchestrator-as-code (Anthropic)

The comparison matrix

What the 50-tool problem reveals

The interoperability question

The decision framework

Key takeaways

FAQ

Further reading

Related across topics

The Workload-Router-Pool: how vLLM thinks about fleet inference

Share on