Hankweave | Southbridge.AI

At Southbridge, we build complex agents for data work that need to operate over long task horizons - hundreds of hours and thousands of toolcalls - with high reliability.

To do this, we needed agents that could be repaired, maintained and improved. We wanted to stop rebuilding things from scratch, and find abstractions that allowed us to stack complexity while keeping things simple enough for us to reason about. We found that over increasingly complex problems, the ultimate bottleneck is the human in the loop being able to understand and reason about the behavior of an agent.

We also found it hard to ship agentic capability across team members and companies. “It works but you’ll need me” just wasn’t acceptable. We needed a way to declaratively define long, complex flows that worked regardless of who ran them or where they were run.

Hankweave started as an answer to both of these problems and grew into a lot more. Today hankweave runs 100% of the reliable AI work at Southbridge, and we're proud to open-source it as a research snapshot.

Hankweave executes AI programs built as sequences of codons: agentic blocks that encapsulate prompts, workspaces, and behavior checks. Codons are executed inside harnesses we know and love, like the Claude Agent SDK and Codex. Over time, we've begun to see coding agents as REPLs to test functionality, before freezing them into codons to be reused. When something breaks, we repair the codon, and the fix travels everywhere it's used.

     sentinels (citations, lazy-detection)
       ┊               ┊               ┊
┌──────┴─────┐  ┌──────┴─────┐  ┌──────┴─────┐  ┌────────────┐  ┌────────────┐
│  1. Gather │─→│ 2. Research│─→│  3. Verify │─→│ 4. Consoli-│─→│   5. Run   │
│    info    │  │   + data   │  │(fresh eyes)│  │date + reorg│  │ simulations│
└────────────┘  └──────┬─────┘  └──────┬─────┘  └────────────┘  └──────┬─────┘
                       │               │                               │
                       └──── loop ─────┘                               │
┌──────────────────────────────────────────────────────────────────────┘
│
┌────────────┐  ┌────────────┐  ┌────────────┐  ┌────────────┐
│  6. Write  │─→│ 7. Prepare │─→│  8. Write  │─→│  9. Verify │
│ validations│  │ + graphics │  │   LaTeX    │  │   report   │
└────────────┘  └────────────┘  └────────────┘  └────────────┘

Hankweave adds the one context engineering feature we felt was missing everywhere else: being able to remove things. Hankweave allows us to firewall codons, make context handoffs explicit, and be surgically precise with what each agent sees. Controlled forgetting turns out to be just as important as recall for keeping a long run coherent.

Hankweave evolved around a simple premise: the problems you run into when running agents at scale should have well-designed pathways to solve them. A lot of hankweave’s features feel strange until the moment you need them - and then they’re exactly the tool for the job. Structured event journals make it easy to trace issues to the right codons, sentinels make it easy to measure and catch known problems like laziness, convention violation and drift.

Before a single token is cast, the runtime catches as many problems as it can - API keys, model availability, file paths, rig configs, sentinel schemas - all validated upfront. The goal is to never waste a long run on something that could have been caught in the first second.

Under the hood, hankweave manages checkpointing, rollbacks, config and auth resolution, harness shims, structured event logs, LLM proxies, workspace isolation, context boundaries, preflight validation, codon sequencing, loop expansion, archive manifests, and a lot more.

data

hank

config

▼

data

symlinked, read-only

content-hashed for resume

›

hank

3 codons · 1 loop · sentinels

globalSystemPromptpromptFilesrigSetupsentinelscheckpointsloopsexhaustWithPromptarchiveOnSuccessoverrides

{
  "globalSystemPromptFile": "system.md",

  // codon 1: gather data from source
  { "model": "claude-4-sonnet",
    "promptFile": "prompts/gather.md",
    "continuationMode": "fresh",
    "rigSetup": [
      { "copy": { "from": "templates/" } },
      { "command": "git pull origin main" },
      { "command": "npm install" }
    ],
    "checkpointedFiles": ["workspace/**"],
    // sentinel: watches for missing citations
    "sentinels": [{
      "sentinelConfig": "citations.json",
      "trigger": { "on": "tool.result" },
      "execution": "debounce"
    }] },

  // loop: research and verify, repeat until confident
  { "terminateOn": { "iterationLimit": 3 },
    "codons": [ /* research, verify, consolidate */ ],
    // archive drafts between iterations for comparison
    "archiveOnSuccess": ["workspace/drafts/**"] },

  // codon 3: write report with opus
  { "model": "opus",
    "appendSystemPromptFile": "prompts/write-system.md",
    // easy consuela loop: keep improving until context fills
    "exhaustWithPrompt": "Review and continue improving..." },

  "overrides": { "model": "opus" },
  "requirements": { "env": ["NOTION_API_KEY"] }
}

›

runtime config

API keys

model settings

env variables

CI auth resolution

›

CONSUMERS

built-in CLIdata pipelinesCI systemscustom UIs

↑packets via websocket↑

HANKWEAVE RUNTIME

preflightconfig resolution · model checks · API keys · paths · rigs · sentinels

orchestration

execution planner

sequences codons

expands loops

manages continuations

state manager

tracks run progress

validates transitions

detects crashes

checkpoint system

snapshots per codon

shadow git repo

tracks file changes

rollback engine

restores to any point

cleans up rigs

rewinds state

▼

CODON RUNNER

rig setup

→

prompt build

→

spawn harness

→

parse events

→

sentinels

→

checkpoint

→

next codon

context exhaustion recovery · auto-extension loops

▼

monitoring & infrastructure

sentinel engine

watches behavior

triggers on patterns

corrects in real time

prompt builder

merges system + codon

resolves templates

strips comments

event journal

logs every tool call

structured JSONL

streams to consumers

LLM provider registry

resolves models

checks availability

tracks capabilities

▼

LLM proxy · intercepts API calls · debugging & monitoring

▼

HARNESSES

Claude Code SDK

Gemini CLI

Codex

+ your own

▼

AGENTROOT

agentRoot/

read_only_data_source/← symlinked

workspace/agent scratch space

context-bridges/handoffs between codons

.hankweave/state, checkpoints, logs

rigArchive/previous outputs

Hankweave operates on a single agentic thread, with observer agents to catch mistakes and take notes. Agents inside hankweave operate with file and shell tools and not much else. Hankweave is designed primarily for headless, hermetically sealed flows: think CNC instead of a hand-chisel.

We built hankweave because we couldn’t buy it. Every feature - or wart - was added to solve a problem we actually hit.

We're excited for you to get started:

Read the docsConcepts, guides, and the full reference See a real hankAnnotated examples from our production work Explore the codeSource, architecture, and everything under the hood

Or just run it: bunx hankweave

Read more about agentic engineering at Southbridge

Antibrittle Agents - the theory behind hankweave
CCEPL-driven development - the Claude Code workflow for building hanks
Systems of Lasting Value - how to get out of this software crisis

sentinels (citations, lazy-detection) ┊ ┊ ┊ ┌──────┴─────┐ ┌──────┴─────┐ ┌──────┴─────┐ ┌────────────┐ ┌────────────┐ │ 1. Gather │─→│ 2. Research│─→│ 3. Verify │─→│ 4. Consoli-│─→│ 5. Run │ │ info │ │ + data │ │(fresh eyes)│ │date + reorg│ │ simulations│ └────────────┘ └──────┬─────┘ └──────┬─────┘ └────────────┘ └──────┬─────┘ │ │ │ └──── loop ─────┘ │ ┌──────────────────────────────────────────────────────────────────────┘ │ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐ │ 6. Write │─→│ 7. Prepare │─→│ 8. Write │─→│ 9. Verify │ │ validations│ │ + graphics │ │ LaTeX │ │ report │ └────────────┘ └────────────┘ └────────────┘ └────────────┘

{ "globalSystemPromptFile": "system.md", // codon 1: gather data from source { "model": "claude-4-sonnet", "promptFile": "prompts/gather.md", "continuationMode": "fresh", "rigSetup": [ { "copy": { "from": "templates/" } }, { "command": "git pull origin main" }, { "command": "npm install" } ], "checkpointedFiles": ["workspace/**"], // sentinel: watches for missing citations "sentinels": [{ "sentinelConfig": "citations.json", "trigger": { "on": "tool.result" }, "execution": "debounce" }] }, // loop: research and verify, repeat until confident { "terminateOn": { "iterationLimit": 3 }, "codons": [ /* research, verify, consolidate */ ], // archive drafts between iterations for comparison "archiveOnSuccess": ["workspace/drafts/**"] }, // codon 3: write report with opus { "model": "opus", "appendSystemPromptFile": "prompts/write-system.md", // easy consuela loop: keep improving until context fills "exhaustWithPrompt": "Review and continue improving..." }, "overrides": { "model": "opus" }, "requirements": { "env": ["NOTION_API_KEY"] } }