At Southbridge, we build complex agents for data work that need to operate over long task horizons - hundreds of hours and thousands of toolcalls - with high reliability.
To do this, we needed agents that could be repaired, maintained and improved. We wanted to stop rebuilding things from scratch, and find abstractions that allowed us to stack complexity while keeping things simple enough for us to reason about. We found that over increasingly complex problems, the ultimate bottleneck is the human in the loop being able to understand and reason about the behavior of an agent.
We also found it hard to ship agentic capability across team members and companies. “It works but you’ll need me” just wasn’t acceptable. We needed a way to declaratively define long, complex flows that worked regardless of who ran them or where they were run.
Hankweave started as an answer to both of these problems and grew into a lot more. Today hankweave runs 100% of the reliable AI work at Southbridge, and we're proud to open-source it as a research snapshot.
Hankweave executes AI programs built as sequences of codons: agentic blocks that encapsulate prompts, workspaces, and behavior checks. Codons are executed inside harnesses we know and love, like the Claude Agent SDK and Codex. Over time, we've begun to see coding agents as REPLs to test functionality, before freezing them into codons to be reused. When something breaks, we repair the codon, and the fix travels everywhere it's used.
sentinels (citations, lazy-detection)
┊ ┊ ┊
┌──────┴─────┐ ┌──────┴─────┐ ┌──────┴─────┐ ┌────────────┐ ┌────────────┐
│ 1. Gather │─→│ 2. Research│─→│ 3. Verify │─→│ 4. Consoli-│─→│ 5. Run │
│ info │ │ + data │ │(fresh eyes)│ │date + reorg│ │ simulations│
└────────────┘ └──────┬─────┘ └──────┬─────┘ └────────────┘ └──────┬─────┘
│ │ │
└──── loop ─────┘ │
┌──────────────────────────────────────────────────────────────────────┘
│
┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐
│ 6. Write │─→│ 7. Prepare │─→│ 8. Write │─→│ 9. Verify │
│ validations│ │ + graphics │ │ LaTeX │ │ report │
└────────────┘ └────────────┘ └────────────┘ └────────────┘Hankweave adds the one context engineering feature we felt was missing everywhere else: being able to remove things. Hankweave allows us to firewall codons, make context handoffs explicit, and be surgically precise with what each agent sees. Controlled forgetting turns out to be just as important as recall for keeping a long run coherent.
Hankweave evolved around a simple premise: the problems you run into when running agents at scale should have well-designed pathways to solve them. A lot of hankweave’s features feel strange until the moment you need them - and then they’re exactly the tool for the job. Structured event journals make it easy to trace issues to the right codons, sentinels make it easy to measure and catch known problems like laziness, convention violation and drift.
Before a single token is cast, the runtime catches as many problems as it can - API keys, model availability, file paths, rig configs, sentinel schemas - all validated upfront. The goal is to never waste a long run on something that could have been caught in the first second.
Under the hood, hankweave manages checkpointing, rollbacks, config and auth resolution, harness shims, structured event logs, LLM proxies, workspace isolation, context boundaries, preflight validation, codon sequencing, loop expansion, archive manifests, and a lot more.
{
"globalSystemPromptFile": "system.md",
// codon 1: gather data from source
{ "model": "claude-4-sonnet",
"promptFile": "prompts/gather.md",
"continuationMode": "fresh",
"rigSetup": [
{ "copy": { "from": "templates/" } },
{ "command": "git pull origin main" },
{ "command": "npm install" }
],
"checkpointedFiles": ["workspace/**"],
// sentinel: watches for missing citations
"sentinels": [{
"sentinelConfig": "citations.json",
"trigger": { "on": "tool.result" },
"execution": "debounce"
}] },
// loop: research and verify, repeat until confident
{ "terminateOn": { "iterationLimit": 3 },
"codons": [ /* research, verify, consolidate */ ],
// archive drafts between iterations for comparison
"archiveOnSuccess": ["workspace/drafts/**"] },
// codon 3: write report with opus
{ "model": "opus",
"appendSystemPromptFile": "prompts/write-system.md",
// easy consuela loop: keep improving until context fills
"exhaustWithPrompt": "Review and continue improving..." },
"overrides": { "model": "opus" },
"requirements": { "env": ["NOTION_API_KEY"] }
}Hankweave operates on a single agentic thread, with observer agents to catch mistakes and take notes. Agents inside hankweave operate with file and shell tools and not much else. Hankweave is designed primarily for headless, hermetically sealed flows: think CNC instead of a hand-chisel.
We built hankweave because we couldn’t buy it. Every feature - or wart - was added to solve a problem we actually hit.
We're excited for you to get started:
Or just run it: bunx hankweave
Read more about agentic engineering at Southbridge
- Antibrittle Agents - the theory behind hankweave
- CCEPL-driven development - the Claude Code workflow for building hanks
- Systems of Lasting Value - how to get out of this software crisis