The metric everyone uses for framework selection is wrong

GitHub stars are a proxy for developer awareness and community enthusiasm. They are not a proxy for production readiness, enterprise adoption, or fitness for your specific use case.

The framework with the most GitHub stars in the AI agent space is OpenClaw, with 210,000+ stars. OpenClaw is not a developer framework. It is a personal AI assistant product.

The framework leading enterprise production adoption is LangGraph, with 24,800 stars and 34.5 million monthly PyPI downloads. The gap between those two numbers tells you everything about how misleading star counts are for framework evaluation.

This guide covers 15 frameworks with data on both GitHub presence and actual production usage, framed around the decision criteria that matter for technical leaders: total cost of ownership, vendor lock-in risk, production readiness, team skill requirements, and ecosystem fit.

The 2025 to 2026 consolidation: what changed and why it matters

The framework ecosystem entered 2024 fragmented and exited 2025 consolidated. Understanding what happened explains why several once-prominent frameworks should now be avoided for new projects.

Microsoft merged AutoGen and Semantic Kernel. Both were widely used independently. AutoGen focused on multi-agent conversation with code execution capabilities. Semantic Kernel provided a plugin architecture for enterprise LLM integration. Microsoft is unifying them into the Microsoft Agent Framework (GA targeted Q1 2026). AutoGen is now in maintenance mode. Organizations building on AutoGen should plan migration timelines.

LangChain officially repositioned around LangGraph. Rather than continuing to evolve LangChain's agent abstractions, the LangChain team directed all new agent development to LangGraph, which treats agent workflows as directed graphs with explicit state management. LangChain remains the right tool for RAG and simple LLM applications. LangGraph is the agent layer.

Early autonomous agent projects faded. BabyAGI, the project that sparked the agentic AI wave in 2023, is largely unmaintained. AgentGPT saw a burst of adoption but limited production usage. SuperAGI pivoted toward enterprise offerings with limited open-source traction. These projects served an important purpose in demonstrating what autonomous agents could do. They are not appropriate foundations for production systems in 2026.

OpenClaw exploded out of nowhere. Going from 9,000 to 210,000+ GitHub stars in approximately 60 days is unprecedented in the open-source ecosystem. The growth reflects a real demand: non-developers who want powerful AI assistance without building anything. Peter Steinberger (founder of PSPDFKit, the PDF SDK company) built it for personal use and released it. The messaging app integrations and zero-configuration Ollama support resonated immediately.

The full landscape: 15 frameworks with production data

Tier 1: Dominant frameworks by actual production usage

LangGraph

GitHub stars: 24,800
Monthly PyPI downloads: 34.5 million
Best for: Stateful enterprise workflows, complex multi-step agents, cyclic reasoning
Enterprise readiness: High
Key production users: Elastic, Replit, Uber, Klarna

LangGraph represents agent workflows as directed graphs. Nodes are processing steps. Edges are transitions between steps. Cycles are supported, meaning an agent can loop until a condition is met. This graph model produces more maintainable code for complex workflows than callback-based alternatives and gives developers precise control over state management, branching logic, and parallel execution.

The Klarna deployment is the most-cited production proof point: a LangGraph-based support agent handled 2.3 million customer conversations in its first month, equivalent to 700 full-time agents, contributing to approximately $40 million in annual cost savings. This deployment scaled from zero to production volume without the agent architecture requiring fundamental redesign, which is the practical test of production readiness.

Learning curve is high relative to CrewAI. Team members need to understand graph concepts, state schemas, and checkpoint management. The investment pays off for complex workflows. For simpler multi-agent coordination, CrewAI requires less engineering.

CrewAI

GitHub stars: 44,300
Monthly PyPI downloads: 5.2 million
Best for: Role-based multi-agent workflows, rapid deployment
Enterprise readiness: High
Key production users: Wide adoption across SMB and mid-market

CrewAI abstracts multi-agent orchestration into a role model. You define agents with roles, goals, and backstories. You define tasks. You define which agent handles which task. CrewAI handles coordination. This abstraction reduces development time for typical role-based workflows by 60 to 70 percent versus LangGraph.

The trade-off is control. When multi-agent interactions produce unexpected results, debugging in CrewAI is harder than in LangGraph because the coordination logic is abstracted away. For well-defined workflows with predictable task structures, CrewAI is the fastest path from idea to production. For workflows that require fine-grained control over execution flow, LangGraph is more appropriate.

At 5.2 million monthly downloads, CrewAI is the second most widely adopted framework in active production use after LangGraph.

OpenAI Agents SDK

GitHub stars: 19,000
Monthly PyPI downloads: 10.3 million
Best for: Production agents in the OpenAI ecosystem
Enterprise readiness: High
Note: Replaces the deprecated Assistants API

OpenAI released its Agents SDK to replace the Assistants API, which was deprecated in early 2026. The SDK provides a clean Python interface for building agents that use OpenAI models, with built-in support for tool calling, handoffs between agents, and guardrails. At 10.3 million monthly downloads, it has established itself quickly.

The constraint is obvious: it is designed for OpenAI models. Organizations wanting to use Claude, Gemini, or open-source models need to add compatibility layers. For teams that have standardized on OpenAI and want the simplest possible path to production agents, this SDK is the right choice.

Tier 2: Enterprise-ready frameworks for specific niches

Microsoft Agent Framework (AutoGen + Semantic Kernel)

AutoGen GitHub stars: 54,600 (maintenance mode)
Monthly PyPI downloads: 856K
Best for: Azure environments, Microsoft 365 integration, enterprise governance
Enterprise readiness: Very high
Deployment status: GA targeted Q1 2026

For Azure-centric organizations, the Microsoft Agent Framework is the natural choice. The unification of AutoGen and Semantic Kernel provides a cohesive SDK for building agents that integrate natively with Azure AI Foundry, Microsoft 365, Copilot Studio, and the 1,400+ connectors available through Power Platform.

Microsoft charges no additional fee for agent creation or execution beyond model token costs, which compares favorably to some commercial agent platforms. The governance tooling is the most mature of any framework, which matters for enterprise compliance requirements.

AutoGen's existing capabilities for code execution agents remain available. Semantic Kernel's plugin architecture for enterprise integrations is preserved. For existing AutoGen users, migration to the unified framework is the recommended path.

Google ADK (Agent Development Kit)

GitHub stars: 17,800
Monthly PyPI downloads: 3.3 million
Best for: Google Cloud environments, Vertex AI integration, A2A protocol adoption
Enterprise readiness: High

Google ADK integrates with Vertex AI Agent Engine and serves as Google's answer to framework fragmentation. It champions the A2A (Agent-to-Agent) protocol, which Google developed alongside Anthropic as an open standard for agent interoperability. If multi-agent coordination across organizational boundaries is a requirement, A2A support becomes a selection criterion, and ADK has the most mature implementation.

GCP's GPU pricing advantage ($3.00/hr for H100 versus Azure's $6.98) makes Google Cloud attractive for inference-heavy agent workloads. ADK leverages this through native Vertex AI integration.

Mastra

GitHub stars: 22,000
Monthly npm downloads: 1.77 million
Best for: TypeScript-first development teams, Node.js stacks
Enterprise readiness: High (YC W25, used by Replit and SoftBank)

Mastra is the correct choice for engineering teams that live in TypeScript. LangGraph and CrewAI are Python-native. Running them in TypeScript environments requires either Python microservices, which add operational complexity, or compatibility bridges that add abstraction layers. Mastra was built TypeScript-first from the ground up.

YC backing and production usage at Replit and SoftBank establish credibility beyond startup enthusiasm. The framework is actively developed, well-documented, and growing rapidly. For any team building in Next.js, Node.js, or a TypeScript monorepo, Mastra removes the framework mismatch problem.

LlamaIndex

GitHub stars: 46,100
Monthly downloads: High (primarily PyPI)
Best for: Data-intensive agents, RAG pipelines, structured data retrieval
Enterprise readiness: High

LlamaIndex is the strongest framework for agents that spend most of their time retrieving and synthesizing information from structured data sources. Where LangChain excels at connecting LLMs to tools and services, LlamaIndex excels at connecting LLMs to data. The framework has deep support for SQL databases, APIs, PDFs, and enterprise data sources.

For data analytics agents, research agents that synthesize large document corpora, and any workflow where data retrieval quality is the primary performance driver, LlamaIndex competes strongly with LangGraph.

PydanticAI

GitHub stars: ~12,000
Monthly downloads: Growing
Best for: Teams prioritizing type safety and structured outputs
Enterprise readiness: Growing

PydanticAI applies Pydantic's type validation system to AI agent development, enforcing structured inputs and outputs throughout agent workflows. For teams where type safety is a priority, or where agent outputs need to conform to strict schemas for downstream processing, PydanticAI's approach reduces a class of runtime errors that untyped frameworks encounter in production.

Tier 3: High-star frameworks with different use cases

OpenClaw

GitHub stars: 210,000+
Best for: Personal AI assistants, non-developer deployment
Enterprise readiness: Growing (not a traditional dev framework)
Creator: Peter Steinberger (PSPDFKit)

OpenClaw is in a category of its own. It is the fastest-growing open-source project in GitHub history over a 60-day window and the most starred AI agent project. It is also not a framework for building AI applications. It is a product for deploying AI assistants.

The distinction matters for decision-making. If you are a developer choosing a framework for building an AI-powered product or service, OpenClaw is not the right comparison. If you or your team want a powerful personal AI assistant that works through WhatsApp, Slack, Telegram, or iMessage without writing code, OpenClaw is the best option available.

What makes it relevant to technical leaders: OpenClaw supports Ollama for local inference, meaning a deployment on a Mac Mini or cheap VPS costs nearly nothing to operate. It connects to 50+ services. It can perform real computer actions: file management, web browsing, code execution, calendar and email management. For an executive assistant use case, it is more capable than any SaaS tool at a fraction of the cost.

AutoGPT

GitHub stars: 182,600
Best for: Autonomous goal-directed research and task completion
Enterprise readiness: Evolving

AutoGPT was the project that made autonomous agents mainstream in 2023. The core idea: give an LLM a high-level goal and let it decompose, plan, and execute sub-tasks recursively. The reality in production: autonomous goal-directed agents fail unpredictably on complex tasks in ways that are hard to debug or control.

AutoGPT has evolved toward a more structured platform model, but it remains better suited for research and exploration than for business-critical production workflows. The GitHub star count reflects historical significance more than current adoption.

Dify

GitHub stars: 130,000
Best for: Non-technical teams building LLM apps, low-code AI deployment
Enterprise readiness: High

Dify is the leading low-code platform for AI application development. Its visual builder supports RAG pipelines, agent workflows, chatbots, and text generation applications without requiring Python or TypeScript knowledge. The self-hosted version is free and runs in Docker. The cloud version starts at $59/month.

For organizations where the primary bottleneck is engineering bandwidth, Dify enables marketing, operations, and product teams to build and deploy AI workflows without developer involvement. The trade-off is flexibility: complex workflows that require custom logic need to go outside Dify's visual builder.

Agno (formerly Phidata)

GitHub stars: 36,400
Best for: High-performance agent runtime, memory-efficient deployments
Enterprise readiness: High

Agno rebranded from Phidata in late 2025 and repositioned around performance. Benchmarks show Agno instantiating agents in approximately 2 microseconds with 3.75KB memory per agent, which matters for high-throughput deployments where agent instantiation overhead accumulates. For architectures spinning up many short-lived agents, Agno's performance characteristics are a genuine differentiator.

MetaGPT

GitHub stars: 62,000
Best for: Multi-agent research and simulation, experimental workflows
Enterprise readiness: Experimental

MetaGPT simulates software engineering teams, with specialized agents representing roles like product manager, engineer, and QA analyst collaborating on development tasks. It produces impressive research demonstrations and is a useful platform for studying multi-agent dynamics. It is not a framework for deploying production business workflows.

LangChain

GitHub stars: 123,000
Best for: RAG, document Q&A, tool-augmented LLM apps
Enterprise readiness: High
Note: Agent workloads now directed to LangGraph

LangChain remains the most widely used foundation for LLM application development. The ecosystem of integrations, documentation, and community knowledge is unmatched. For RAG pipelines, document Q&A, and simple tool-augmented applications, LangChain is still the natural starting point.

For stateful agent workflows, LangGraph is now the recommended path within the same ecosystem.

The CTO decision matrix

Stop evaluating frameworks by feature lists. Evaluate them by the four questions that determine whether a framework will work for your specific situation.

Question 1: What is your existing cloud and tech stack?

This is the most important question. Framework portability is real but painful. Switching frameworks after a production deployment is a multi-month engineering project.

Azure-centric organization: Microsoft Agent Framework. It integrates natively with your existing Microsoft licenses, security controls, and compliance infrastructure.
AWS organization: LangGraph with Bedrock, or OpenAI Agents SDK if you are using OpenAI models exclusively.
GCP organization: LangGraph or Google ADK, depending on how heavily invested you are in the Google ecosystem.
TypeScript-first engineering team: Mastra. Do not fight language mismatch.
Non-technical team: Dify for self-build, or vendor platforms for standard use cases.
Personal AI assistant deployment: OpenClaw.

Question 2: How complex is your workflow?

Simple tool-calling (under 3 tools, no state management): You may not need a framework at all. See the contrarian guide for when raw API calls outperform framework abstractions.
Single-agent with multiple tools: LangChain or OpenAI Agents SDK.
Multi-agent with defined roles: CrewAI for speed, LangGraph for control.
Complex stateful workflows with cycles and branching: LangGraph.
Code-heavy agents that write and execute code: Microsoft Agent Framework (former AutoGen capabilities).

Question 3: What are your team's skill requirements?

Framework	Required skills	Learning curve	Time to first agent
Dify	None (visual builder)	Low	Hours
CrewAI	Python, basic LLM concepts	Medium	Days
OpenAI Agents SDK	Python, OpenAI API	Low-Medium	Days
Mastra	TypeScript, Node.js	Medium	Days
LangChain	Python, LLM concepts	Medium	Days-Weeks
LangGraph	Python, graph concepts, state schemas	High	Weeks
Microsoft Agent Framework	C# or Python, Azure	Medium-High	Days-Weeks
Google ADK	Python, GCP	Medium	Days

Question 4: What does vendor lock-in look like?

Every framework has a lock-in profile. Being explicit about it before you commit avoids unpleasant surprises 18 months later.

LangGraph: Locks in to the Python ecosystem and LangChain's architecture. Relatively portable across LLM providers. LangSmith for observability is proprietary but optional.
Microsoft Agent Framework: Locks in to Azure and Microsoft ecosystem. Portability off Azure is expensive.
OpenAI Agents SDK: Locks in to OpenAI models. Switching providers requires rewriting integration layers.
CrewAI: Locks in to Python and CrewAI's role-based abstraction. More portable across LLM providers than OpenAI SDK.
Mastra: Locks in to TypeScript ecosystem. Relatively portable otherwise.
Google ADK: Locks in to GCP and Vertex AI.
Dify: Lock-in is low for self-hosted. Cloud version creates dependency on Dify's platform.

What the GitHub star count actually tells you

Organized into categories to clarify what each high-star project actually represents:

Personal AI assistant products (not developer frameworks):

OpenClaw: 210,000 stars

Early-generation autonomous agent experiments (historically important, not production-ready for most use cases):

AutoGPT: 182,600 stars
BabyAGI: Archived
SuperAGI: Pivoted

Low-code platforms for non-technical builders:

Dify: 130,000 stars

Foundation framework (RAG and LLM apps):

LangChain: 123,000 stars

Research-oriented multi-agent simulation:

MetaGPT: 62,000 stars

Enterprise-grade production frameworks by actual downloads:

LangGraph: 34.5M monthly downloads
OpenAI Agents SDK: 10.3M monthly downloads
CrewAI: 5.2M monthly downloads
Google ADK: 3.3M monthly downloads
Mastra: 1.77M monthly downloads

The frameworks doing the most actual work in production are in the second list. The frameworks with the most stars are in the first list. They are largely different projects.

The ecosystem consolidation risk

One dimension of framework selection that almost no comparison covers: ecosystem risk. What happens if the framework you choose loses momentum?

AutoGen is the active cautionary example. Microsoft placed it in maintenance mode in late 2025. Organizations that built production systems on AutoGen must now plan migration timelines to the Microsoft Agent Framework. The migration is not technically difficult, but it consumes engineering cycles and requires retesting.

Signs of framework health to evaluate before committing:

Active release cadence: How frequently are new versions shipped? Is the changelog substantive?
Commercial backing: Is there a company with revenue incentive to maintain the project, or is it purely community-driven?
Enterprise adoption: Are companies with real engineering standards using it in production?
Community engagement: Active GitHub issues and discussions indicate a living project.

By these criteria, LangGraph (Langchain Inc. backing, active releases, Klarna and Uber in production), CrewAI (CrewAI Inc. backing, strong adoption growth), and Mastra (YC backing, Replit and SoftBank users) are the healthiest independent frameworks. Microsoft Agent Framework and Google ADK carry the implicit maintenance commitment of their respective tech giants.

The recommendation by organization type

You are a 10 to 50 person company building your first AI agents: Start with CrewAI if your team knows Python. Start with Mastra if your team knows TypeScript. Both frameworks get you to production faster than LangGraph and handle most business use cases well.

You are a 50 to 500 person company with an Azure or Microsoft environment: Microsoft Agent Framework. The governance, compliance, and integration story with your existing Microsoft licenses is worth more than any technical framework advantage.

You are a 500+ person enterprise with complex stateful workflows: LangGraph with LangSmith observability. The investment in the learning curve pays off at scale. The Klarna case study is the relevant proof point.

You are an engineering team building a TypeScript product: Mastra. Do not fight the language mismatch.

You are on Google Cloud or want to adopt A2A interoperability standards: Google ADK. The open standards angle matters as agent ecosystems grow.

You want a personal AI assistant without engineering work: OpenClaw with Ollama on a Mac Mini or VPS.

You want to enable non-technical teams to build AI workflows: Dify, self-hosted or cloud.

The framework decision sets the foundation for everything that follows: observability strategy, cost management, team skills investment, and vendor relationships. Getting it right at the start is worth the time this decision takes. Getting it wrong means a painful migration 12 to 18 months from now when the friction of the wrong choice becomes undeniable.

Not sure which framework fits your use case?

We work with CTOs and engineering leads to evaluate framework options against their specific stack, team skills, and workflow complexity. If you have a framework decision in front of you and want a second opinion from someone who has worked through these trade-offs in production, book a call.

Book a 30-minute call

For the cost implications of whichever framework you choose, the hidden costs guide covers the five cost drivers that cause most budgets to run 3x over estimates. For the cloud infrastructure layer underneath your framework, the CTO cloud infrastructure playbook covers the AWS, Azure, and GCP decision with real pricing at three scale levels.