When should I use an LLM node instead of a code node in an agent workflow?

Use an LLM node when the input space is too large to enumerate with rules, when the task requires natural language understanding or generation, or when the decision boundary cannot be expressed as a conditional. Examples: classifying free-text intent, generating personalized responses, deciding which tool to call based on ambiguous user input. If you can write an if-else chain or a regex that handles 95%+ of cases, use code instead.

How much cheaper are code nodes than LLM nodes?

A code node executing a JSON validation or regex match runs in under 1 millisecond and costs effectively nothing. An LLM node performing the same task takes 200-800ms and costs $0.01-0.10 per call depending on the model and input size. For tasks that run thousands of times per day, this difference compounds to 3-5x total workflow cost. HyEvo's research found that evolving hybrid graphs (replacing LLM nodes with code where possible) achieves up to 19x cost reduction on code generation tasks.

What is the hybrid supervisor pattern?

The hybrid supervisor pattern uses an LLM node for decisions that require reasoning (which sub-task to execute next, how to handle an edge case) and code nodes for everything the LLM decides. The LLM acts as a planner or router; code nodes act as executors. This concentrates the LLM's cost and latency on the one step where flexibility matters, while keeping execution fast and deterministic.

How does HyEvo automate the LLM vs code node decision?

HyEvo (arXiv 2603.19639) uses multi-island evolutionary search to discover optimal hybrid workflow graphs. It starts with a population of workflows containing different mixes of LLM and code nodes, evaluates each on task performance and cost, and evolves better designs through a reflect-then-generate mechanism. The result: automatically discovered workflows that are 19x cheaper and 16x faster than prior automated methods, with comparable or better accuracy.

Hybrid agentic workflows: when to use an LLM node vs a code node

8 minute read

“If you can write a unit test for it, it should not be an LLM call.”

TL;DR

Every agentic workflow mixes reasoning (flexible, slow, expensive) with execution (rigid, fast, cheap). Most teams default to LLM nodes for everything — paying 3-5x in latency and cost for tasks that have deterministic solutions. HyEvo’s research shows evolved hybrid graphs achieve up to 19x cost reduction on code generation tasks. This post derives a decision framework: code when the transformation is unambiguous, LLM when the input space is too large, hybrid supervisor when you need both. For the static workflow patterns this builds on, see agent workflow patterns.

A hybrid electronic circuit board with a clear physical boundary running down the middle

Why do teams over-use LLM nodes?

Because LLM nodes are easy to write and hard to get wrong at demo time.

Need to extract a date from text? An LLM call handles it in one prompt. Need to validate JSON? An LLM can check it. Need to route a request to the right handler? Ask the LLM. Each of these works reliably in development with a few dozen test cases.

The problem surfaces at production scale. That date extraction runs 50,000 times per day. The JSON validation sits in a hot path called on every API request. The router handles every incoming message. At those volumes, the gap between an LLM node and a code node becomes painful:

Operation	Code node	LLM node	Ratio
JSON validation	<1ms, ~$0	200-500ms, ~$0.01	500x latency
Date extraction	<1ms (regex)	300-800ms, ~$0.02	800x latency
Format conversion	<1ms	200-400ms, ~$0.01	400x latency
Intent classification	N/A (too many intents)	200-500ms, ~$0.02	LLM needed
Free-text summarization	N/A	500-2000ms, ~$0.05	LLM needed

The bottom two rows are where LLM nodes earn their cost. The top three are where code nodes should replace them. The pattern: if the transformation has a finite, enumerable input-output mapping, code wins. If the input space is open-ended or the task requires language understanding, the LLM is necessary.

How do you decide which node type a step needs?

Three questions, in order.

1. Can I write a unit test that covers 95%+ of inputs?

If yes → code node. JSON schema validation, regex-based extraction, format conversion, arithmetic, API response parsing, error code routing — these are all testable with deterministic assertions. An LLM adds nothing here except cost and latency.

2. Is the input space too large to enumerate?

If yes → LLM node. Free-text classification where users can say the same thing a thousand different ways. Summarization of arbitrary documents. Deciding which of 50 tools to call based on a natural language request. These tasks have combinatorial input spaces that rules cannot cover.

3. Does the step need both reasoning and execution?

If yes → hybrid pattern. The LLM decides what to do; code does it. An LLM classifies the user’s intent, then a code node routes to the correct handler. An LLM plans which database queries to run, then a code node executes them and validates results.

graph TD
    A[New workflow step] --> B{Can I write<br/>unit tests for 95%<br/>of inputs?}
    B -->|Yes| C[Code node]
    B -->|No| D{Does it need<br/>language understanding<br/>or generation?}
    D -->|Yes| E[LLM node]
    D -->|No| F[Probably code node<br/>with edge-case fallback]
    E --> G{Does execution<br/>need to be<br/>deterministic?}
    G -->|Yes| H[Hybrid: LLM decides,<br/>code executes]
    G -->|No| E

What does the hybrid supervisor pattern look like?

The most common production pattern concentrates LLM calls at decision points and code at execution points.

graph LR
    A[User input] --> B[LLM Router<br/>Classify intent]
    B -->|billing| C[Code: fetch invoice<br/>format response]
    B -->|technical| D[Code: search KB<br/>retrieve docs]
    B -->|complex| E[LLM: generate<br/>custom response]
    C --> F[Code: validate output<br/>apply template]
    D --> F
    E --> F
    F --> G[Response]

The LLM node (Router) handles the one step that requires flexibility — classifying ambiguous user intent. Everything downstream is code: database queries, template formatting, output validation. If 70% of requests route to billing or technical (the code paths), the workflow uses the LLM for 30% of requests. Cost drops proportionally.

A real document processing pipeline I have seen in production illustrates the cost difference:

Before (pure LLM): Extract fields → Validate schema → Normalize dates → Classify document type → Generate summary. Five LLM calls per document. At 10,000 documents per day: ~$1,500/month in API costs, P95 latency 8 seconds.

After (hybrid): Classify document type (LLM) → Extract fields (code, regex + rules) → Validate schema (code, JSON schema) → Normalize dates (code, dateutil) → Generate summary (LLM). Two LLM calls per document. Cost: ~$600/month. P95 latency: 3 seconds. Same accuracy on the extraction and validation steps — they were deterministic to begin with.

What did HyEvo discover about optimal hybrid graphs?

HyEvo (arXiv 2603.19639) automates this design decision. Rather than a human deciding which nodes should be LLM vs code, HyEvo uses evolutionary search to discover optimal graphs.

The system represents workflows as directed acyclic graphs with typed nodes. LLM nodes have a backbone model, instructions, and temperature. Code nodes have synthesized source code with typed I/O signatures. Evolution generates new graph topologies, evaluates them on task performance and efficiency, and selects for the best designs.

The reflect-then-generate mechanism is where the intelligence lives. A meta-agent examines why a workflow failed — was the LLM node hallucinating on a task that code could handle deterministically? Was the code node too rigid for inputs it had never seen? — and proposes structural changes. Replace this LLM node with code. Add a fallback LLM node for edge cases this code node cannot handle. Merge these two sequential LLM nodes into one.

The results: up to 19x cost reduction on code generation tasks and up to 16x latency reduction on code tasks compared to AFlow (the prior state-of-the-art in automated workflow optimization), with comparable or better accuracy across GSM8K (93.36%), MATH (53.91%), and HumanEval (93.89%).

The 19x number deserves scrutiny. It likely comes from AFlow using more LLM calls than necessary — the same over-reliance on LLM nodes that human-designed workflows exhibit. HyEvo finds the code-node replacements that humans miss because they optimize for development speed, not runtime cost.

For more on how these evolved architectures relate to the broader self-evolution trend, see self-evolving agent architectures.

What are the risks of aggressive code-node replacement?

Two failure modes to watch for.

Brittleness at the edges. A code node that handles 95% of inputs fails silently on the other 5%. An LLM node handles the edges gracefully, even if slowly. The mitigation: add an LLM fallback. When the code node encounters an input it cannot process (parsing failure, confidence below threshold, unrecognized format), route to an LLM node. This costs 5% of LLM overhead instead of 100%.

Maintenance burden. Code nodes require traditional software maintenance — bug fixes, input format changes, dependency updates. LLM nodes adapt to format changes automatically (within limits). If your input schema changes monthly, the maintenance cost of code nodes may exceed the inference cost savings. Stable interfaces favor code nodes. Volatile interfaces favor LLM nodes.

Key takeaways

If you can unit-test it, use code. JSON validation, format conversion, date extraction, API parsing — none of these need an LLM.
If the input space is open-ended, use an LLM. Free-text classification, summarization, tool selection — these require language understanding.
The hybrid supervisor pattern concentrates cost. LLM for decisions, code for execution. Most production workflows should use LLM nodes for 20-40% of steps, code for the rest.
HyEvo proves 19x savings are achievable. Evolved hybrid graphs replace unnecessary LLM nodes with code, cutting cost and latency dramatically.
Add LLM fallbacks to code nodes. Handle the 95% case with code, the 5% edge case with an LLM. Best of both worlds.

Hybrid agentic workflows: when to use an LLM node vs a code node

TL;DR

Why do teams over-use LLM nodes?

How do you decide which node type a step needs?

What does the hybrid supervisor pattern look like?

What did HyEvo discover about optimal hybrid graphs?

What are the risks of aggressive code-node replacement?

Key takeaways

Further reading

Related across topics

Share on

TL;DR

Why do teams over-use LLM nodes?

How do you decide which node type a step needs?

What does the hybrid supervisor pattern look like?

What did HyEvo discover about optimal hybrid graphs?

What are the risks of aggressive code-node replacement?

Key takeaways

Further reading

Related across topics

Model Serving Architecture

Share on