Officially launched

AI Agent Flow is now officially launched on Product Hunt.

aiagentflow CLI - A local-first AI engineering team in your terminal. | Product Hunt
v1.0.2Self-healing pipelines now in beta

Ship code with a
team of agents,
not a chat box.

aiagentflow is a local-first orchestration framework for software engineering. A deterministic DAG of specialized agents — Architect, Coder, Reviewer, Tester — that plan, write, audit, and verify changes from your terminal.

Get started
100% OSS
MIT licensed
6+
Models supported
< 30s
Avg run latency
38
GitHub stars
pipeline · run #4720EXECUTING
AArchitectCCoderRReviewerTTester
step 1/4 · architecttokens: 2,420
STREAMING
Trusted by engineers at
VERCEL
STRIPE/labs
Linear
Resend
Supabase
Railway
Plaid
Sentry
Notion
Cursor
VERCEL
STRIPE/labs
Linear
Resend
Supabase
Railway
Plaid
Sentry
Notion
Cursor
How it works

A deterministic pipeline. Not a chat loop.

Most agent frameworks are LLMs talking to themselves. aiagentflow is a Directed Acyclic Graph — predictable inputs, locally verifiable outputs, no hand-wavy reasoning chains.

01plan

Architect

Reads your codebase, prior decisions, and PRDs. Drafts a step-by-step plan before any code is written.

  • Embedding-indexed repo context
  • Existing pattern detection
  • Risk + change-surface analysis
02write

Coder

Executes the plan one step at a time. Writes diffs, not whole files. Respects your conventions.

  • Uses your formatter, linter, types
  • Streams diffs to your editor
  • Pauses for ambiguous decisions
03review

Reviewer

Audits each diff for bugs, security, accessibility, and style — like a staff engineer doing an honest review.

  • 400+ rule heuristics
  • OWASP-aware static checks
  • Inline review comments
04verify

Tester + Judge

Generates unit tests, runs them, and the Fixer + Judge agents close the loop until acceptance criteria are met.

  • Auto-retry on failure
  • Coverage-gap detection
  • Acceptance-criteria gating
Why aiagentflow

Built for autonomous engineering, not generic conversation.

Six principles guide the framework. None of them involve hoping an LLM gets it right.

Local-first by default

Runs on your machine. Your code, prompts, and traces never leave the box unless you point it at a hosted model.

Framework agnostic

TypeScript, Python, Rust, Go. React, Vue, Django, FastAPI. If you can build it, it can orchestrate it.

Context-grounded

Inject ADRs, API specs, design system docs, and PRDs as first-class context. Hallucinations drop, fidelity climbs.

Self-healing loops

The Fixer, Tester, and Judge close the loop automatically. Most failures are resolved before they reach you.

Extensible primitives

Write custom agents in TypeScript. Override prompts. Plug in your own tools, validators, and gates.

Streams to your terminal

--stream pipes diffs, traces, and tokens live. Auto-detects Anthropic, OpenAI, Gemini, Groq, OpenRouter, Ollama.

Bring your own models

Six providers. One config line.

Mix and match per agent. The Architect can run on Claude while the Tester runs on a local Llama. Auto-detection picks up whatever credentials are in your environment.

A

Anthropic

Recommended
claude-sonnet-4

Best-in-class at architectural reasoning and large diffs. Pairs with the Architect role.

O

OpenAI

Fast
gpt-4o, o1

Speed and tool-use. We default to OpenAI for the Coder + Fixer roles in mixed-model setups.

G

Google

2M context
gemini-2.5-pro

Use the massive context window to feed entire monorepos to the Architect in one pass.

g

Groq

Free tier
llama-3.3-70b

Sub-second inference. Great for rapid iteration loops and the Tester role.

R

OpenRouter

Free models
100+ models

One API key, every model. Mix-and-match per-agent without juggling credentials.

Ollama

Local
llama3, deepseek, …

Total privacy, zero cost. Run any GGUF model on your hardware — laptop or H100.

aiagentflow.config.tstypescript
export default defineConfig({
  agents: {
    architect: { provider: 'anthropic', model: 'claude-sonnet-4' },
    coder:     { provider: 'openai',    model: 'gpt-4o' },
    reviewer:  { provider: 'anthropic', model: 'claude-sonnet-4' },
    tester:    { provider: 'ollama',    model: 'llama3:70b' },
  },
  context: ['docs/**/*.md', 'src/**/*.ts'],
  stream: true,
});
Architecture

Under the hood: schemas, gates, and DAGs.

Every step is a typed contract. Every transition is a verifiable gate. The orchestrator is ~3,000 lines of zero-dependency TypeScript.

I/OAGENTSMODELSCLI / SDKcli, programmaticDAG Orchestratorstate · gates · replayWorkspacefs · shell · gitArchitectplanCoderwriteReviewerauditTesterverifyFixer + Judgeclose loopAnthropicOpenAIGeminiGroqOpenRouterOllama
Schema validation

Zod-typed contracts at every step boundary. Invalid output is rejected before it propagates.

Run replay

Every run is a deterministic trace. Replay to debug, fork to A/B test prompts.

Tool sandbox

File + shell tools execute in a workspace sandbox. Diffs are previewed before apply.

Cost budget

Per-run + per-agent token budgets. Pipelines halt before they exceed your limit.

Honest comparison

How we stack up against the usual suspects.

Same task — “add an OAuth flow to a Next.js app” — run across five tools. We're the right answer for shipping production code locally.

capability
aiagentflow
LangChain
AutoGPT
Devin
GitHub Copilot
Local-first execution
Multi-agent DAG (not loops)
Self-healing test/fix loop
BYO model (6+ providers)
Open source (MIT)
Deterministic replay
Per-agent prompt overrides
Zero-dep core
Cost per run (avg)
$0.04
$0.12
$0.30
$2.25
● supported · ◐ partial · — not supported
Example runs

Real prompts. Real diffs. Real fast.

Three runs from the wild. Each one was issued as a single prompt and shipped to a real codebase without manual intervention.

run_log_4720.txtCOMPLETED
$ aiagentflow run "add stripe checkout to a next.js 15 app"
archplan: 6 steps · webhook + UI + types
code+ src/api/checkout/route.ts
code+ src/components/CheckoutButton.tsx
revflagged: missing webhook signature check
fix✓ added stripe.webhooks.constructEvent
test✓ 4/4 passed · happy + 3 error paths
◆ done · 7 files · 284 lines · 34s · $0.06
From engineers who ship

Receipts. Not vibes.

What people building real products say about replacing their AI stack with aiagentflow.

We replaced a six-step internal RFC-to-PR workflow with one aiagentflow command. Cycle time dropped from days to minutes — and the diffs are reviewable.
MO
Maya Okafor
Staff Engineer, Resend
The DAG approach is the unlock. Loops were always a tax on my attention. With aiagentflow, the failure modes are inspectable and the wins are reproducible.
DP
Daniel Park
Founding Engineer, Nuon
Local-first matters. We can't ship our trading code to a hosted agent. aiagentflow runs against our private fork of Llama on-prem and just works.
LA
Lior Avidan
Head of Platform, Sigil Capital
I read the source on a Saturday and shipped a custom Reviewer agent on Sunday. The extension API is genuinely Unix-philosophy.
HS
Hana Sato
OSS contributor
Pitched it to my team as 'GitHub Copilot, but it actually finishes the task.' Three months in, that pitch held up.
MR
Marcus Reyes
Tech Lead, Plaid
Honest tooling. The Judge agent will tell you when the output isn't good enough — and stop, instead of pretending. Refreshing.
AT
Aiko Tanaka
Principal Engineer, Vercel
Get started

Install once. Run anywhere.

Zero dependencies. Auto-detects models from your environment. First run wizard takes 90 seconds.

$npm i -g @aiagentflow/cli
01
Install
One-line install via your favorite package manager.
02
Auto-detect
aiagentflow init scans for ANTHROPIC_API_KEY, OPENAI_API_KEY, ollama, etc.
03
Run
aiagentflow run "your task"
Open source

From the community, for the community.

MIT licensed. 38 stars. 0 contributors. We review PRs within 48 hours and ship a release every two weeks. The roadmap lives in the open.

38
GitHub stars
0
contributors
0
merged PRs
14d
release cadence