The case that made this question worth asking

A thread on Hacker News in late 2025 described a CTO at a law office who had built approximately 900 production AI agents over 18 months. The agents handled intake classification, document routing, contract review triage, research summarization, and status update generation. The system processed thousands of documents daily at production reliability.

The technology stack: OpenAI's chat completions API, structured outputs via JSON Schema, and Python. No LangChain. No LangGraph. No CrewAI. No AutoGen. Just function calls, well-crafted prompts, and output parsers.

The comment generated hundreds of responses, mostly from developers who had invested weeks learning framework abstractions and were confronting the possibility that those abstractions were not necessary for their use cases.

The CTO's observation was not that frameworks are bad. It was that frameworks solve specific problems, and most of the agents he needed did not have those problems. When the agent takes a document, extracts structured data from it, and routes the document to the right queue, there is no state management problem, no multi-agent coordination problem, and no complex workflow orchestration problem. There is a well-defined input-output transformation problem. Raw API calls with structured outputs handle it cleanly.

This guide explores when that insight applies and when it does not.

What the framework-industrial complex sells

The term "framework-industrial complex" is deliberately provocative, but the phenomenon it describes is real.

The agents selling AI agent frameworks have a shared interest in frameworks being necessary. Framework creators need adoption to justify engineering investment. Cloud providers want framework adoption to increase LLM API consumption. Consulting firms charge more for complex architectures than simple ones. Conference talks about sophisticated multi-agent orchestration are more impressive than talks about function calls.

None of this means frameworks are bad. Most of the popular frameworks solve real problems well. It means that the information environment around framework selection is systematically biased toward complexity. The voices asking "do you actually need this?" are underrepresented relative to the voices selling the complexity.

The practical consequence: many organizations deploy LangChain, CrewAI, or LangGraph for agents that would have been simpler, faster to build, and easier to maintain as direct API calls. They discover this 6 months into production when debugging the framework's abstractions takes longer than the original development did.

The three things frameworks actually solve

Frameworks are not unnecessary. They solve genuine problems. The question is whether you have those problems.

Problem 1: Stateful workflows with complex branching

A single-turn LLM call is stateless. You send a prompt, you receive a response, execution ends. For agents that need to maintain state across many steps, with conditional branching based on intermediate results, retry logic, and checkpointing for recovery, building this infrastructure from scratch is non-trivial.

LangGraph's directed graph model is genuinely useful here. Defining an agent as a graph with nodes (processing steps) and edges (transitions based on state) produces code that is maintainable and debuggable for workflows with 10+ steps, multiple branches, and cycles. The alternative, a custom state machine built on raw API calls, is not inherently worse, but it requires building infrastructure that LangGraph provides.

When this applies to you: Your agent has more than 5 to 7 sequential steps with branching logic. Your workflow needs cycle support (the agent loops until a condition is met). You need checkpointing so a failed run can resume rather than restart.

When it does not apply: Your agent follows a fixed sequence of steps with no branching. Your steps are short enough that restarting on failure is acceptable. Your state fits in a simple dictionary that you can pass between function calls.

Problem 2: Multi-agent coordination at scale

When multiple specialized agents need to work together, the coordination problem is real: how does Agent A delegate a subtask to Agent B, receive the result, and continue? How does an orchestrator agent manage a pool of worker agents running in parallel? How do you aggregate results from multiple concurrent agents into a coherent output?

Building this coordination infrastructure from scratch is a significant engineering project. CrewAI provides role-based coordination primitives that handle delegation, result aggregation, and inter-agent communication. LangGraph handles more complex coordination patterns where the orchestration logic itself needs to be stateful.

When this applies to you: You need 3 or more agents with distinct roles working on the same task in coordination. Your agents need to delegate subtasks to each other dynamically based on task content. You need parallel agent execution with result aggregation.

When it does not apply: Your "multi-agent" system is really a pipeline where one agent's output is the next agent's input. Each step is independent and sequential. A series of function calls passing data between them handles this without framework overhead.

Problem 3: Observability across complex chains

When a multi-step agent produces unexpected output, you need to know which step produced the problem, what the input to that step was, and what the LLM actually said. Without trace logging across all LLM calls, debugging requires reading raw API logs that may not exist or may not be structured for this purpose.

LangChain's integration with LangSmith, and most frameworks' integration with Langfuse, provides automated trace logging that captures every LLM call with full prompt and completion text, tool calls with arguments and results, step timing, and token counts. This is genuinely valuable for complex agents and would require meaningful custom engineering to replicate cleanly.

When this applies to you: Your agent has 5+ LLM calls in a workflow. Production debugging without traces would require reconstructing execution flow from logs. Your team will need to analyze failure modes at scale.

When it does not apply: Your agent makes 1 to 3 LLM calls. You can add structured logging to each call directly. The agent's logic is simple enough that output failures have obvious causes.

The minimum viable agent pattern

For agents that do not have the three problems above, here is the pattern that handles most production use cases without framework overhead.

The four components

Input handler: Validates and normalizes the incoming data. Raises an error early if required fields are missing or malformed. Returns a clean, typed input struct.

System prompt: Defines the agent's role, capabilities, constraints, and output format. The most important investment in agent quality. A well-crafted system prompt is worth more than any framework abstraction.

Tool definitions: JSON Schema descriptions of the functions the agent can call. The LLM uses these to decide which tools to call and with what arguments. Define only the tools the agent needs. Every unnecessary tool definition increases the risk of the LLM calling the wrong one.

Output parser: Extracts structured data from the LLM's response. With structured output support (OpenAI, Anthropic, and most major providers support constrained JSON generation), the parser is often just a JSON deserializer into a typed output struct.

The code pattern

import openai
import json
from pydantic import BaseModel
from typing import Optional

class AgentOutput(BaseModel):
    classification: str
    confidence: float
    routing_destination: str
    summary: Optional[str] = None

def document_classifier_agent(document_text: str, document_type: str) -> AgentOutput:
    client = openai.OpenAI()
    
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {
                "role": "system",
                "content": """You are a legal document classifier. 
                Classify documents by urgency and route them to the correct team.
                Always respond with valid JSON matching the required schema."""
            },
            {
                "role": "user", 
                "content": f"Document type: {document_type}\n\nContent:\n{document_text}"
            }
        ],
        response_format={"type": "json_object"}
    )
    
    output_data = json.loads(response.choices[0].message.content)
    return AgentOutput(**output_data)

This agent has no framework dependencies. It uses the provider's API directly. It produces strongly typed, validated output. It deploys as a function in any environment that can make HTTP calls. Any developer who knows Python and the OpenAI API can debug it.

For 70 to 80 percent of production agent use cases, this pattern or a small extension of it is the right approach.

Extending to a simple pipeline

When you need multiple steps, extend the pattern with explicit data passing rather than a framework's state management:

def research_and_draft_agent(company_name: str, contact_role: str) -> DraftOutput:
    # Step 1: Research
    research = company_research_agent(company_name)
    
    # Step 2: Draft
    draft = email_draft_agent(
        company_data=research,
        contact_role=contact_role
    )
    
    # Step 3: Review
    final = tone_review_agent(
        draft=draft,
        guidelines=TONE_GUIDELINES
    )
    
    return final

This three-step pipeline has explicit data flow, is debuggable at each step with standard print statements or structured logging, and requires no framework knowledge to understand. If step 2 fails, you can call email_draft_agent in isolation with the step 1 output to reproduce and fix the problem.

Compare this to the equivalent LangGraph implementation: defining a state schema, defining nodes and edges, configuring a StateGraph, managing state updates through the graph's built-in mechanisms. The LangGraph version is more powerful for complex workflows. For this three-step linear pipeline, it is engineering overhead without benefit.

The decision tree

Work through this before adding framework complexity:

Does your agent maintain complex state across more than 5 to 7 steps, with branching logic based on intermediate results?

Yes: LangGraph is worth the investment.
No: Continue.

Does your agent involve 3 or more specialized agents with dynamic delegation between them?

Yes: CrewAI or LangGraph, depending on how complex the coordination is.
No: Continue.

Does your agent have so many LLM calls that debugging without automated trace logging would be a serious operational burden?

Yes: Add Langfuse or LangSmith observability, with or without a framework.
No: Continue.

Does your agent require production-grade recovery and checkpointing for long-running tasks?

Yes: LangGraph's persistence layer is the right tool.
No: Continue.

If you reached this point: You probably do not need a framework. Build with direct API calls. Add framework complexity only when you hit one of the four problems above.

The framework churn risk

The Microsoft AutoGen situation is a concrete illustration of framework risk that belongs in every evaluation.

AutoGen was widely recommended for multi-agent deployments throughout 2024. Microsoft Research built it, maintained it actively, and published research papers about it. Production deployments were built on it at organizations that did careful technical due diligence.

In late 2025, Microsoft moved AutoGen to maintenance mode and redirected new development to the unified Microsoft Agent Framework. The migration path exists but requires engineering investment, retesting, and careful handling of behavior differences between AutoGen and its replacement.

This is not a criticism of Microsoft's decision, which is arguably the right technical choice. It is an illustration of the framework churn risk: an AI agent framework that is best practice today may be legacy or discontinued within 18 months. The ecosystem is moving faster than traditional software categories.

The implications for architecture decisions:

Frameworks built by companies with revenue incentive to maintain them carry lower churn risk than community projects. LangGraph (LangChain Inc.), CrewAI (CrewAI Inc.), and Mastra (YC-backed) have commercial organizations motivated to keep their frameworks current. AutoGen (Microsoft Research) was a research project that was merged into a commercial product line, which is a different risk profile.

The cost of framework migration is proportional to how deeply the framework is embedded. Agents that use a framework for orchestration primitives while keeping business logic in plain functions migrate more easily than agents where framework abstractions are woven throughout the codebase.

Direct API calls never require migration. OpenAI's chat completions API, Anthropic's messages API, and Google's Gemini API have maintained backward compatibility across major version changes. Code written against these APIs from 2023 still runs. No framework dependency means no framework migration risk.

This is not an argument to never use frameworks. It is an argument to use them deliberately, to understand what problem you are solving with them, and to architect so that framework-specific code is isolated and replaceable.

Where simplicity has genuine limits

Intellectual honesty requires acknowledging where the "you don't need a framework" argument breaks down.

At significant scale, manual observability becomes insufficient. An agent making 3 LLM calls is debuggable with print statements. An agent making 15 LLM calls across multiple parallel branches produces enough call data that manual log analysis is impractical. At this complexity level, the observability tooling that frameworks provide is worth the overhead.

Multi-agent coordination is genuinely hard to implement correctly. Race conditions, result aggregation from parallel agents, and graceful handling of partial failures in agent networks require infrastructure that is non-trivial to build. CrewAI and LangGraph solve these problems. Solving them from scratch is a significant engineering project that few teams should undertake without a compelling reason to avoid existing solutions.

Prompt management at scale benefits from framework tooling. When you have dozens of agents, each with their own system prompts, managing prompt versions, testing prompt changes across agents, and rolling back problematic prompt updates is a real operational challenge. Frameworks provide some tooling for this. Direct API calls provide none.

The resumability problem is real for long-running agents. An autonomous research agent that runs for 40 minutes and fails at step 35 needs to be able to resume from step 35, not restart from step 1. LangGraph's persistence and checkpointing is designed for exactly this. Replicating it with raw API calls requires building a checkpointing system that ends up looking a lot like a simplified LangGraph.

The minimum viable agent pattern is the right starting point for most organizations and the right permanent architecture for many use cases. It is not the right architecture for every use case. The framework decision should be driven by the specific problems you have, not by the ecosystem's enthusiasm for complexity.

The honest summary

Use direct API calls plus structured outputs for agents that:

Follow linear workflows with fewer than 7 steps
Do not require parallel execution or multi-agent coordination
Have clear success criteria for each step
Do not need checkpointing or recovery from mid-workflow failures

Add a framework when you have:

Complex stateful workflows with branching and cycles (LangGraph)
Multi-agent coordination with delegation (CrewAI or LangGraph)
Production observability requirements across many LLM calls (any framework with LangSmith or Langfuse)
Long-running tasks that need checkpointing (LangGraph)

The law office CTO who built 900 agents without a framework was not doing something wrong. He was doing something right: matching tool complexity to problem complexity. The 900 agents did not have stateful multi-step branching problems or multi-agent coordination problems. They had well-defined transformation problems. Direct API calls were the correct tool.

The same discipline applied to your context means asking: what specific problem would a framework solve for this agent? If you cannot name it concretely, you probably do not have it.

Not sure if your use case needs a framework or not?

This is one of the most common architecture questions we get from engineering teams early in their AI agent work. If you want to talk through your specific workflow, team skills, and complexity level and get a direct answer, book a call.

Book a 30-minute call

For organizations that have determined frameworks are the right tool for their complexity level, the framework comparison guide covers which framework fits which use case with production usage data rather than feature tables. For the cost implications of architecture choices, the hidden costs guide covers the five cost drivers that affect both framework-based and direct API architectures.