Agent Frameworks Landscape

11 minute read

“To Framework or Not to Framework? Navigating the Agent Ecosystem.”

TL;DR

The agent framework landscape spans four abstraction levels: raw code, utilities (LangChain Core, LlamaIndex Core), graph orchestrators (LangGraph, LlamaIndex Workflows), and multi-agent platforms (AutoGen, CrewAI). LangGraph dominates production use cases with its stateful graph model and human-in-the-loop capabilities, while AutoGen excels at code generation and CrewAI at creative role-playing tasks. The key insight: start with raw Python to understand the pain points, then adopt a framework only when you find yourself reinventing state persistence and graph execution for the third time.

Three test instruments of increasing complexity on a laboratory bench, from multimeter to automated test rack

1. Why Is Agentic AI a Wild West?

In 2023, building an agent was a simple affair: you wrote a while True loop in Python, appended strings to a list called messages, and called the OpenAI API. It was raw, messy, and understandable.

Fast forward to today, and we are drowning in frameworks. LangChain, LangGraph, AutoGen, CrewAI, LlamaIndex, Semantic Kernel, MetaGPT… the ecosystem has exploded. Every week, a new library promises to be the “Rails for Agents.”

For a developer, this Paralysis of Choice is dangerous. Choosing the wrong framework can lock you into a rigid architecture, force you to learn obscure abstractions that wrap simple API calls, and essentially become “Technical Debt as a Service.” Conversely, refusing to use frameworks can leave you re-implementing basic utilities (like PDF parsing or retry logic) for weeks.

In this comprehensive landscape analysis, we will map the Agent Framework Ecosystem. We won’t just list features; we will analyze the Philosophy behind each framework, their abstraction costs, their “Opinionatedness,” and ultimately, when you should (or shouldn’t) use them.

2. What Are the Abstraction Layers in Agent Frameworks?

To really understand the landscape, we must visualize the layers of abstraction. Not all “frameworks” are solving the same problem. Some are utilities; others are full-blown operating systems.

2.1 What Is Level 0: Raw Code?

Tech: Python requests, openai SDK (pip install openai).
Philosophy: “I want to see the prompt. I want to control the bytes.”
Pros:
Infinite Flexibility: You are limited only by Python and the API.
Zero Overhead: No latency from wrapper libraries.
Debuggability: When it breaks, you know exactly where. There is no “magic” happening behind the scenes.
Cons:
Boilerplate: You reinvent the wheel constantly. You have to write your own recursive retry logic, your own JSON parser, your own token counter.
Maintenance: As APIs change (e.g., OpenAI moving from Function Calling to Tools), you have to refactor everything manually.
Verdict: Best for production engineers building high-performance, strictly defined tools where reliability is paramount.

2.2 What Is Level 1: The Utilities Layer?

Tech: LangChain Core, LlamaIndex (Core).
Philosophy: “Give me tools for the boring stuff, but let me write the logic.”
Capabilities:
Loaders: “Read this PDF/notion/slack.”
Splitters: “Chunk this text into 500-token logical blocks.”
Vector Connectors: “Talk to Pinecone/Chroma.”
Pros: Saves massive time on commodity tasks (ETL).
Cons: The “Leaky Abstraction” problem. Sometimes the specific way LangChain chunks text isn’t what you want, and overriding it is harder than writing it yourself.

2.3 What Is Level 2: The Graph Orchestrators?

Tech: LangGraph, LlamaIndex Workflows.
Philosophy: “Agents are State Machines. Let me define the nodes (functions) and edges (logic).”
Mechanism: These frameworks force you to define a Directed Acyclic Graph (DAG) or, more commonly for agents, a Cyclic Graph (a graph with loops).
Pros:
Structure: Enforces discipline. You can’t just have spaghetti code; you must define state transitions.
Persistence: They often come with “Checkpointers” that save the state of the graph to a database after every step. This allows for “Time Travel” debugging.
Cons: High cognitive load. You have to think in graphs.

2.4 What Is Level 3: Multi-Agent Platforms?

Tech: AutoGen (Microsoft), CrewAI.
Philosophy: “Agents are people. Let them talk to each other.”
Mechanism: You define “Personas” and “Conversation Policies.” The framework handles the message passing.
Pros:
Emergent Behavior: You can get complex results with very little code. “Here is a Coder Agent and a Reviewer Agent. Goal: Fix this bug.”
Cons:
Non-Determinism: It’s hard to control exactly what happens. Agents might chat endlessly.
Cost: “Chatter” consumes tokens.

3. Who Are the Big Players in Agent Frameworks?

Let’s dissect the specific frameworks dominating the market in 2025.

3.1 How Does LangGraph Work?

LangChain realized that its original “Chain” abstraction (Sequence A -> Sequence B) was too rigid for agents, which loop and branch. They pivoted to LangGraph.

Core Concept: Stateful Graph.
There is a shared State object (a Python dictionary/TypedDict).
Nodes are Python functions that take the State, modify it, and return an update.
Edges define where to go next (e.g., conditional_edge(check_output)).
The “Human-in-the-Loop” Feature: Because LangGraph saves the state after every node execution (using a Checkpointer), you can pause execution.
Scenario: Agent reaches “Execute Code” node. Graph pauses. Human Admin gets a ping. Human approves. Graph resumes.
This is critical for enterprise safety.
Code Snippet (Conceptual): ``python # Define State class State(TypedDict): messages: list

# Define Graph workflow = StateGraph(State) workflow.add_node(“agent”, call_llm) workflow.add_node(“tool”, run_tool)

# Define Logic workflow.set_entry_point(“agent”) workflow.add_conditional_edges(“agent”, should_continue) workflow.add_edge(“tool”, “agent”)

app = workflow.compile() ``

3.2 How Does AutoGen Work?

Developed by Microsoft Research, AutoGen takes a different approach. It treats everything as a “Agent” that can send/receive messages.

Core Concept: UserProxy and Assistant.
AssistantAgent: The LLM. It suggests plans and code.
UserProxyAgent: A proxy for the human (or a system execution environment). It can execute code locally or in Docker.
The Magic: AutoGen excels at Code Generation.
Step 1: Assistant writes Python code to plot a chart.
Step 2: UserProxy detects the code block, executes it (automatically!), and returns the result (or the error trace) to the Assistant.
Step 3: Assistant fixes the error.
Use Case: Data Science. “Here is a csv. Analyze it.” AutoGen will write pandas code, run it, fix errors, and generate the final plot, all with 0 human intervention.

3.3 How Does CrewAI Work?

CrewAI is built on top of LangChain but simplifies the API into a “Team” metaphor.

Core Concept: Processes.
Agents: Defined with Role, Goal, Backstory. (e.g., “You are a veteran journalist.”)
Tasks: Specific units of work assigned to agents.
Process: How they work together.
Sequential: A -> B -> C.
Hierarchical: A Manager Agent assigns tasks to A and B, reviews work, and delegates.
Why people love it: It is incredibly readable. The code looks like an org chart.
Critique: It can be slow. The “Manager” LLM adds latency and cost as it orchestrates everything.

3.4 How Does LlamaIndex Work for Agents?

LlamaIndex started as “GPT Index,” a tool to connect LLMs to your data. It has evolved into a full agent framework.

Core Concept: RAG-First Agents.
While other frameworks focus on generic tools, LlamaIndex focuses on Query Engines.
Workflow: Their new event-driven workflow engine allows you to build agents that are triggered by data events (e.g., “New file added to folder”).
Context Management: LlamaIndex has the best algorithms for Token Packing and Chunking Optimization.
Use Case: If your agent’s primary job is reading 500 PDF contracts and answering questions about them, LlamaIndex is the superior choice.

4. What Are the Key Architectural Patterns: Graph vs. Conversation?

When choosing a framework, you are choosing an architecture.

4.1 How Does the Graph Pattern Work?

Structure: Explicit State Machine.
Control: High. You explicitly define every transition. “If A succeeds, go to B. If A fails, go to C.”
Reliability: High. It behaves predictably.
Development Speed: Slower. You have to define the graph structure.

4.2 How Does the Conversational Pattern Work?

Structure: Free-form Chat.
Control: Low. The LLM decides who speaks next based on the conversation history.
Reliability: Lower. Agents might get stuck in “Politeness loops” (“Thank you!” “No, thank you!”) or fail to hand off tasks correctly.
Development Speed: Fast. Just define agents and say “Chat.”

5. Decision Matrix: Which one should you choose?

Here is a guide for the perplexed engineer in 2025.

Scenario	Recommendation	Why?
Simple “Chat with PDF”	LlamaIndex	Best data connectors and chunking logic. RAG is their bread and butter.
Production Enterprise SaaS	LangGraph	You need strict state management, “Human-in-the-loop” approval, and unit testing. You can’t afford non-determinism.
Experimental Data Analysis	AutoGen	Best code execution sandbox. It writes and runs code better than anything else.
Creative Content / Marketing	CrewAI	Role-playing abstraction is perfect for creative tasks where “style” matters multiple agents (Writer, Editor) improve quality.
High-Performance Micro-Agent	Raw Python / OpenAI SDK	Don’t pay the latency tax of a framework. If looking for a “Router,” just write the `if` statements.

6. Should You Avoid Agent Frameworks Entirely?

Many senior AI engineers advocate for avoiding frameworks entirely, especially in the beginning. This is often called the “No-Framework” approach (after Hamel Husain’s famous critique).

The Argument: Agent frameworks add layers of “Prompt Magic”. They often inject hidden system prompts (“You are a helpful agent…”) that you can’t see or change easily. This interferes with your ability to prompt engineer specifically for your use case.
The Debugging Nightmare: When a LangChain agent fails, the stack trace goes through 15 layers of abstraction. Debugging raw_python.py is trivial; debugging AgentExecutor.run() is hell.
The Strategy:
1. Start with raw Python. Build your chat_loop. Handle your tool_call.
2. Write your own Tool class (it’s 10 lines of code).
3. Only adopt a framework (like LangGraph) when you find yourself reinventing state persistence and graph execution for the 3rd time.

FAQ

Q: What are the main AI agent frameworks in 2025? A: The major AI agent frameworks are LangGraph (stateful graph orchestration by LangChain), AutoGen (conversational multi-agent by Microsoft), CrewAI (role-based team metaphor), and LlamaIndex (RAG-first agents). Each targets a different abstraction level and use case.

Q: Should I use a framework or build my AI agent from scratch? A: Start with raw Python to understand the mechanics, then adopt a framework when you find yourself reinventing state persistence and graph execution repeatedly. Frameworks add hidden prompt injection and abstraction layers that can hinder debugging.

Q: What is LangGraph and how does it work? A: LangGraph is a stateful graph framework by LangChain where agents are modeled as state machines. You define nodes (Python functions), edges (transitions), and a shared State object. It supports human-in-the-loop approval via checkpointers that save state after every node.

Q: When should I use AutoGen vs LangGraph vs CrewAI? A: Use LangGraph for production enterprise apps needing strict control and human-in-the-loop. Use AutoGen for data science and code generation tasks. Use CrewAI for creative content where role-playing and team metaphors improve output quality.

Q: What is the graph pattern vs conversational pattern in agent architecture? A: The graph pattern (LangGraph) uses explicit state machines with defined transitions for high control and reliability. The conversational pattern (AutoGen) uses free-form chat where the LLM decides flow, offering faster development but lower predictability.

7. Key Takeaways

The framework landscape is consolidating around two poles: Graphs (LangGraph) for engineers who want control, state, and reliability, and Swarms (AutoGen/CrewAI) for researchers who want capability, emergence, and conversation.
There is no “Best Framework.” There is only the right level of abstraction for your problem.
Building a Banking Bot? Use LangGraph (Control).
Building a Stock Research Bot? Use AutoGen (Code Execution).
Building a Story Writer? Use CrewAI (Creativity).
Start with raw Python to understand the pain points before committing to any framework.

With the landscape understood, the next step is to Build Your First Agent from scratch, using raw Python to see exactly what these frameworks hide from you.

Originally published at: arunbaby.com/ai-agents/0006-agent-frameworks-landscape

If you found this helpful, consider sharing it with others who might benefit.

Want to work together?

I take on projects, advisory roles, and fractional CTO engagements in AI/ML. I also help businesses go AI-native with agentic workflows and agent orchestration.

Get in touch

Agent Frameworks Landscape

TL;DR

1. Why Is Agentic AI a Wild West?

2. What Are the Abstraction Layers in Agent Frameworks?

2.1 What Is Level 0: Raw Code?

2.2 What Is Level 1: The Utilities Layer?

2.3 What Is Level 2: The Graph Orchestrators?

2.4 What Is Level 3: Multi-Agent Platforms?

3. Who Are the Big Players in Agent Frameworks?

3.1 How Does LangGraph Work?

3.2 How Does AutoGen Work?

3.3 How Does CrewAI Work?

3.4 How Does LlamaIndex Work for Agents?

4. What Are the Key Architectural Patterns: Graph vs. Conversation?

4.1 How Does the Graph Pattern Work?

4.2 How Does the Conversational Pattern Work?

5. Decision Matrix: Which one should you choose?

6. Should You Avoid Agent Frameworks Entirely?

FAQ

7. Key Takeaways

Related across topics

Share on

TL;DR

1. Why Is Agentic AI a Wild West?

2. What Are the Abstraction Layers in Agent Frameworks?

2.1 What Is Level 0: Raw Code?

2.2 What Is Level 1: The Utilities Layer?

2.3 What Is Level 2: The Graph Orchestrators?

2.4 What Is Level 3: Multi-Agent Platforms?

3. Who Are the Big Players in Agent Frameworks?

3.1 How Does LangGraph Work?

3.2 How Does AutoGen Work?

3.3 How Does CrewAI Work?

3.4 How Does LlamaIndex Work for Agents?

4. What Are the Key Architectural Patterns: Graph vs. Conversation?

4.1 How Does the Graph Pattern Work?

4.2 How Does the Conversational Pattern Work?

5. Decision Matrix: Which one should you choose?

6. Should You Avoid Agent Frameworks Entirely?

FAQ

7. Key Takeaways

Related across topics

Climbing Stairs

Model Evaluation Metrics

Text-to-Speech (TTS) System Fundamentals

Share on