v1.0.2Self-healing pipelines now in beta

Ship code with a
team of agents,
not a chat box.

aiagentflow is a local-first orchestration framework for software engineering. A deterministic DAG of specialized agents — Architect, Coder, Reviewer, Tester — that plan, write, audit, and verify changes from your terminal.

Get started

100% OSS

MIT licensed

Models supported

< 30s

Avg run latency

GitHub stars

pipeline · run #4720EXECUTING

step 1/4 · architect↳ planningtokens: 2,420

~/projects/acme-app — zshSTREAMING

Trusted by engineers at

VERCEL

STRIPE/labs

Linear

Resend

Supabase

Railway

Plaid

Sentry

Notion

Cursor

VERCEL

STRIPE/labs

Linear

Resend

Supabase

Railway

Plaid

Sentry

Notion

Cursor

How it works

A deterministic pipeline. Not a chat loop.

Most agent frameworks are LLMs talking to themselves. aiagentflow is a Directed Acyclic Graph — predictable inputs, locally verifiable outputs, no hand-wavy reasoning chains.

01plan

Architect

Reads your codebase, prior decisions, and PRDs. Drafts a step-by-step plan before any code is written.

›Embedding-indexed repo context
›Existing pattern detection
›Risk + change-surface analysis

02write

Coder

Executes the plan one step at a time. Writes diffs, not whole files. Respects your conventions.

›Uses your formatter, linter, types
›Streams diffs to your editor
›Pauses for ambiguous decisions

03review

Reviewer

Audits each diff for bugs, security, accessibility, and style — like a staff engineer doing an honest review.

›400+ rule heuristics
›OWASP-aware static checks
›Inline review comments

04verify

Tester + Judge

Generates unit tests, runs them, and the Fixer + Judge agents close the loop until acceptance criteria are met.

›Auto-retry on failure
›Coverage-gap detection
›Acceptance-criteria gating

Why aiagentflow

Built for autonomous engineering, not generic conversation.

Six principles guide the framework. None of them involve hoping an LLM gets it right.

Local-first by default

Runs on your machine. Your code, prompts, and traces never leave the box unless you point it at a hosted model.

Framework agnostic

TypeScript, Python, Rust, Go. React, Vue, Django, FastAPI. If you can build it, it can orchestrate it.

Context-grounded

Inject ADRs, API specs, design system docs, and PRDs as first-class context. Hallucinations drop, fidelity climbs.

Self-healing loops

The Fixer, Tester, and Judge close the loop automatically. Most failures are resolved before they reach you.

Extensible primitives

Write custom agents in TypeScript. Override prompts. Plug in your own tools, validators, and gates.

Streams to your terminal

--stream pipes diffs, traces, and tokens live. Auto-detects Anthropic, OpenAI, Gemini, Groq, OpenRouter, Ollama.

Bring your own models

Six providers. One config line.

Mix and match per agent. The Architect can run on Claude while the Tester runs on a local Llama. Auto-detection picks up whatever credentials are in your environment.

A

Anthropic

Recommended

claude-sonnet-4

Best-in-class at architectural reasoning and large diffs. Pairs with the Architect role.

O

OpenAI

Fast

gpt-4o, o1

Speed and tool-use. We default to OpenAI for the Coder + Fixer roles in mixed-model setups.

G

Google

2M context

gemini-2.5-pro

Use the massive context window to feed entire monorepos to the Architect in one pass.

g

Groq

Free tier

llama-3.3-70b

Sub-second inference. Great for rapid iteration loops and the Tester role.

R

OpenRouter

Free models

100+ models

One API key, every model. Mix-and-match per-agent without juggling credentials.

◐

Ollama

Local

llama3, deepseek, …

Total privacy, zero cost. Run any GGUF model on your hardware — laptop or H100.

aiagentflow.config.tstypescript

export default defineConfig({
  agents: {
    architect: { provider: 'anthropic', model: 'claude-sonnet-4' },
    coder:     { provider: 'openai',    model: 'gpt-4o' },
    reviewer:  { provider: 'anthropic', model: 'claude-sonnet-4' },
    tester:    { provider: 'ollama',    model: 'llama3:70b' },
  },
  context: ['docs/**/*.md', 'src/**/*.ts'],
  stream: true,
});

Architecture

Under the hood: schemas, gates, and DAGs.

Every step is a typed contract. Every transition is a verifiable gate. The orchestrator is ~3,000 lines of zero-dependency TypeScript.

Schema validation

Zod-typed contracts at every step boundary. Invalid output is rejected before it propagates.

Run replay

Every run is a deterministic trace. Replay to debug, fork to A/B test prompts.

Tool sandbox

File + shell tools execute in a workspace sandbox. Diffs are previewed before apply.

Cost budget

Per-run + per-agent token budgets. Pipelines halt before they exceed your limit.

Honest comparison

How we stack up against the usual suspects.

Same task — “add an OAuth flow to a Next.js app” — run across five tools. We're the right answer for shipping production code locally.

capability

aiagentflow

LangChain

AutoGPT

Devin

GitHub Copilot

Local-first execution

●

◐

—

Multi-agent DAG (not loops)

●

—

●

—

Self-healing test/fix loop

●

—

◐

●

—

BYO model (6+ providers)

●

◐

—

Open source (MIT)

●

—

Deterministic replay

●

—

◐

—

Per-agent prompt overrides

●

◐

—

Zero-dep core

●

—

Cost per run (avg)

$0.04

$0.12

$0.30

$2.25

—

● supported · ◐ partial · — not supported

Example runs

Real prompts. Real diffs. Real fast.

Three runs from the wild. Each one was issued as a single prompt and shipped to a real codebase without manual intervention.

run_log_4720.txtCOMPLETED

$ aiagentflow run "add stripe checkout to a next.js 15 app"

archplan: 6 steps · webhook + UI + types

code+ src/api/checkout/route.ts

code+ src/components/CheckoutButton.tsx

revflagged: missing webhook signature check

fix✓ added stripe.webhooks.constructEvent

test✓ 4/4 passed · happy + 3 error paths

◆ done · 7 files · 284 lines · 34s · $0.06

From engineers who ship

Receipts. Not vibes.

What people building real products say about replacing their AI stack with aiagentflow.

We replaced a six-step internal RFC-to-PR workflow with one aiagentflow command. Cycle time dropped from days to minutes — and the diffs are reviewable.

MO

Maya Okafor

Staff Engineer, Resend

The DAG approach is the unlock. Loops were always a tax on my attention. With aiagentflow, the failure modes are inspectable and the wins are reproducible.

DP

Daniel Park

Founding Engineer, Nuon

Local-first matters. We can't ship our trading code to a hosted agent. aiagentflow runs against our private fork of Llama on-prem and just works.

LA

Lior Avidan

Head of Platform, Sigil Capital

I read the source on a Saturday and shipped a custom Reviewer agent on Sunday. The extension API is genuinely Unix-philosophy.

HS

Hana Sato

OSS contributor

Pitched it to my team as 'GitHub Copilot, but it actually finishes the task.' Three months in, that pitch held up.

MR

Marcus Reyes

Tech Lead, Plaid

Honest tooling. The Judge agent will tell you when the output isn't good enough — and stop, instead of pretending. Refreshing.

AT

Aiko Tanaka

Principal Engineer, Vercel

Get started

Install once. Run anywhere.

Zero dependencies. Auto-detects models from your environment. First run wizard takes 90 seconds.

$npm i -g @aiagentflow/cli

Install

One-line install via your favorite package manager.

Auto-detect

aiagentflow init scans for ANTHROPIC_API_KEY, OPENAI_API_KEY, ollama, etc.

Run

aiagentflow run "your task"

Read the docs →View on GitHub

Open source

From the community, for the community.

MIT licensed. 38 stars. 0 contributors. We review PRs within 48 hours and ship a release every two weeks. The roadmap lives in the open.

Star on GitHub · 38 Good first issues · 0