A deterministic pipeline. Not a chat loop.
Most agent frameworks are LLMs talking to themselves. aiagentflow is a Directed Acyclic Graph — predictable inputs, locally verifiable outputs, no hand-wavy reasoning chains.
Built for autonomous engineering, not generic conversation.
Six principles guide the framework. None of them involve hoping an LLM gets it right.
Local-first by default
Runs on your machine. Your code, prompts, and traces never leave the box unless you point it at a hosted model.
Framework agnostic
TypeScript, Python, Rust, Go. React, Vue, Django, FastAPI. If you can build it, it can orchestrate it.
Context-grounded
Inject ADRs, API specs, design system docs, and PRDs as first-class context. Hallucinations drop, fidelity climbs.
Self-healing loops
The Fixer, Tester, and Judge close the loop automatically. Most failures are resolved before they reach you.
Extensible primitives
Write custom agents in TypeScript. Override prompts. Plug in your own tools, validators, and gates.
Streams to your terminal
--stream pipes diffs, traces, and tokens live. Auto-detects Anthropic, OpenAI, Gemini, Groq, OpenRouter, Ollama.
Six providers. One config line.
Mix and match per agent. The Architect can run on Claude while the Tester runs on a local Llama. Auto-detection picks up whatever credentials are in your environment.
Anthropic
RecommendedBest-in-class at architectural reasoning and large diffs. Pairs with the Architect role.
OpenAI
FastSpeed and tool-use. We default to OpenAI for the Coder + Fixer roles in mixed-model setups.
Use the massive context window to feed entire monorepos to the Architect in one pass.
Groq
Free tierSub-second inference. Great for rapid iteration loops and the Tester role.
OpenRouter
Free modelsOne API key, every model. Mix-and-match per-agent without juggling credentials.
Ollama
LocalTotal privacy, zero cost. Run any GGUF model on your hardware — laptop or H100.
Under the hood: schemas, gates, and DAGs.
Every step is a typed contract. Every transition is a verifiable gate. The orchestrator is ~3,000 lines of zero-dependency TypeScript.
Zod-typed contracts at every step boundary. Invalid output is rejected before it propagates.
Every run is a deterministic trace. Replay to debug, fork to A/B test prompts.
File + shell tools execute in a workspace sandbox. Diffs are previewed before apply.
Per-run + per-agent token budgets. Pipelines halt before they exceed your limit.
How we stack up against the usual suspects.
Same task — “add an OAuth flow to a Next.js app” — run across five tools. We're the right answer for shipping production code locally.
Real prompts. Real diffs. Real fast.
Three runs from the wild. Each one was issued as a single prompt and shipped to a real codebase without manual intervention.
Receipts. Not vibes.
What people building real products say about replacing their AI stack with aiagentflow.
We replaced a six-step internal RFC-to-PR workflow with one aiagentflow command. Cycle time dropped from days to minutes — and the diffs are reviewable.
The DAG approach is the unlock. Loops were always a tax on my attention. With aiagentflow, the failure modes are inspectable and the wins are reproducible.
Local-first matters. We can't ship our trading code to a hosted agent. aiagentflow runs against our private fork of Llama on-prem and just works.
I read the source on a Saturday and shipped a custom Reviewer agent on Sunday. The extension API is genuinely Unix-philosophy.
Pitched it to my team as 'GitHub Copilot, but it actually finishes the task.' Three months in, that pitch held up.
Honest tooling. The Judge agent will tell you when the output isn't good enough — and stop, instead of pretending. Refreshing.
Install once. Run anywhere.
Zero dependencies. Auto-detects models from your environment. First run wizard takes 90 seconds.