hankweave

A runtime forrepairableagents

At Southbridge, we build complex agents for data work that need to operate over long task horizons - hundreds of hours and thousands of toolcalls - with high reliability.

To do this, we needed agents that could be repaired, maintained and improved. We wanted to stop rebuilding things from scratch, and find abstractions that allowed us to stack complexity while keeping things simple enough for us to reason about. We found that over increasingly complex problems, the ultimate bottleneck is the human in the loop being able to understand and reason about the behavior of an agent.

We also found it hard to ship agentic capability across team members and companies. “It works but you’ll need me” just wasn’t acceptable. We needed a way to declaratively define long, complex flows that worked regardless of who ran them or where they were run.

Hankweave started as an answer to both of these problems and grew into a lot more. Today hankweave runs 100% of the reliable AI work at Southbridge, and we're proud to open-source it as a research snapshot.


Hankweave executes AI programs built as sequences of codons: agentic blocks that encapsulate prompts, workspaces, and behavior checks. Codons are executed inside harnesses we know and love, like the Claude Agent SDK and Codex. Over time, we've begun to see coding agents as REPLs to test functionality, before freezing them into codons to be reused. When something breaks, we repair the codon, and the fix travels everywhere it's used.

     sentinels (citations, lazy-detection)
       ┊               ┊               ┊
┌──────┴─────┐  ┌──────┴─────┐  ┌──────┴─────┐  ┌────────────┐  ┌────────────┐
│  1. Gather │─→│ 2. Research│─→│  3. Verify │─→│ 4. Consoli-│─→│   5. Run   │
│    info    │  │   + data   │  │(fresh eyes)│  │date + reorg│  │ simulations│
└────────────┘  └──────┬─────┘  └──────┬─────┘  └────────────┘  └──────┬─────┘
                       │               │                               │
                       └──── loop ─────┘                               │
┌──────────────────────────────────────────────────────────────────────┘
│
┌────────────┐  ┌────────────┐  ┌────────────┐  ┌────────────┐
│  6. Write  │─→│ 7. Prepare │─→│  8. Write  │─→│  9. Verify │
│ validations│  │ + graphics │  │   LaTeX    │  │   report   │
└────────────┘  └────────────┘  └────────────┘  └────────────┘

Hankweave adds the one context engineering feature we felt was missing everywhere else: being able to remove things. Hankweave allows us to firewall codons, make context handoffs explicit, and be surgically precise with what each agent sees. Controlled forgetting turns out to be just as important as recall for keeping a long run coherent.

Hankweave evolved around a simple premise: the problems you run into when running agents at scale should have well-designed pathways to solve them. A lot of hankweave’s features feel strange until the moment you need them - and then they’re exactly the tool for the job. Structured event journals make it easy to trace issues to the right codons, sentinels make it easy to measure and catch known problems like laziness, convention violation and drift.

Before a single token is cast, the runtime catches as many problems as it can - API keys, model availability, file paths, rig configs, sentinel schemas - all validated upfront. The goal is to never waste a long run on something that could have been caught in the first second.

Under the hood, hankweave manages checkpointing, rollbacks, config and auth resolution, harness shims, structured event logs, LLM proxies, workspace isolation, context boundaries, preflight validation, codon sequencing, loop expansion, archive manifests, and a lot more.

data
hank
config
data
symlinked, read-only
content-hashed for resume
hank
3 codons · 1 loop · sentinels
globalSystemPromptpromptFilesrigSetupsentinelscheckpointsloopsexhaustWithPromptarchiveOnSuccessoverrides
{
  "globalSystemPromptFile": "system.md",

  // codon 1: gather data from source
  { "model": "claude-4-sonnet",
    "promptFile": "prompts/gather.md",
    "continuationMode": "fresh",
    "rigSetup": [
      { "copy": { "from": "templates/" } },
      { "command": "git pull origin main" },
      { "command": "npm install" }
    ],
    "checkpointedFiles": ["workspace/**"],
    // sentinel: watches for missing citations
    "sentinels": [{
      "sentinelConfig": "citations.json",
      "trigger": { "on": "tool.result" },
      "execution": "debounce"
    }] },

  // loop: research and verify, repeat until confident
  { "terminateOn": { "iterationLimit": 3 },
    "codons": [ /* research, verify, consolidate */ ],
    // archive drafts between iterations for comparison
    "archiveOnSuccess": ["workspace/drafts/**"] },

  // codon 3: write report with opus
  { "model": "opus",
    "appendSystemPromptFile": "prompts/write-system.md",
    // easy consuela loop: keep improving until context fills
    "exhaustWithPrompt": "Review and continue improving..." },

  "overrides": { "model": "opus" },
  "requirements": { "env": ["NOTION_API_KEY"] }
}
runtime config
API keys
model settings
env variables
CI auth resolution
CONSUMERS
built-in CLIdata pipelinesCI systemscustom UIs
packets via websocket
HANKWEAVE RUNTIME
preflightconfig resolution · model checks · API keys · paths · rigs · sentinels
execution planner
sequences codons
expands loops
manages continuations
state manager
tracks run progress
validates transitions
detects crashes
checkpoint system
snapshots per codon
shadow git repo
tracks file changes
rollback engine
restores to any point
cleans up rigs
rewinds state
CODON RUNNER
rig setup
prompt build
spawn harness
parse events
sentinels
checkpoint
next codon
context exhaustion recovery · auto-extension loops
sentinel engine
watches behavior
triggers on patterns
corrects in real time
prompt builder
merges system + codon
resolves templates
strips comments
event journal
logs every tool call
structured JSONL
streams to consumers
LLM provider registry
resolves models
checks availability
tracks capabilities
LLM proxy · intercepts API calls · debugging & monitoring
HARNESSES
Claude Code SDK
Gemini CLI
Codex
+ your own
AGENTROOT
agentRoot/
read_only_data_source/← symlinked
workspace/agent scratch space
context-bridges/handoffs between codons
.hankweave/state, checkpoints, logs
rigArchive/previous outputs

Hankweave operates on a single agentic thread, with observer agents to catch mistakes and take notes. Agents inside hankweave operate with file and shell tools and not much else. Hankweave is designed primarily for headless, hermetically sealed flows: think CNC instead of a hand-chisel.


We built hankweave because we couldn’t buy it. Every feature - or wart - was added to solve a problem we actually hit.

We're excited for you to get started:

Or just run it: bunx hankweave

Read more about agentic engineering at Southbridge