Learn Harness Engineering: Make AI Coding Agents Actually Reliable

⬅️ Back to Tutorials

By the end you will either have a minimal agent harness in your own repo (four files, one audit pass) or a clear path through the full Learn Harness Engineering curriculum. Quick path: 15 minutes. Full course with projects: a few weekends part-time.

TLDR:

WalkingLabs teaches harness engineering: the repo environment that makes Codex and Claude Code reliable, not smarter prompts alone.
Five subsystems: instructions, state, verification, scope, session lifecycle.
Copy templates from the Resource Library (AGENTS.md, feature_list.json, claude-progress.md, init.sh).
Audit any repo with audit-harness.sh (no Node required).
Six projects build the same Electron knowledge-base app while you add harness layers; capstone includes benchmark and ablation scripts.

Prerequisites: A real repo you use with Claude Code, Cursor, or Codex. For hands-on projects: Git, Node.js, and an agent CLI. No paid course account; everything is free on GitHub and the docs site (15 languages).

Step 1: Read the core idea (10 minutes)

Open Lecture 01 and Lecture 02.

Success looks like you can explain this in one sentence: the model writes code; the harness governs when, where, how, and when “done” is allowed. Anthropic’s long-running agent experiments and OpenAI’s Codex harness writeups are the cited sources. Same model, weak harness vs strong harness is the difference between cleanup duty and review duty.

Step 2: Drop the minimal harness into your project

Browse the English Resource Library. Download or copy these into your project root:

your-repo/
├── AGENTS.md              # operating manual for the agent
├── feature_list.json      # scope: features and done/not-done
├── claude-progress.md     # session log (rename if you use Codex)
└── init.sh                # install + verify before work starts

Adapt names to your stack (CLAUDE.md if you prefer Claude Code conventions). Fill feature_list.json with real features, not placeholders. Point AGENTS.md at your test command and definition of done.

Success: your next agent session starts by reading these files instead of guessing repo rules from scratch.

Step 3: Audit what you have

curl -fsSL https://raw.githubusercontent.com/walkinglabs/learn-harness-engineering/main/tools/audit-harness.sh | bash -s -- /path/to/your/repo

Or clone the course repo and run locally:

git clone https://github.com/walkinglabs/learn-harness-engineering.git
bash learn-harness-engineering/tools/audit-harness.sh /path/to/your/repo

Success: script exits 0 when critical harness items pass. Failures list missing files or weak spots (no progress log, no verification hook, etc.).

Pro tip: The repo also ships a skills/harness-creator/ skill that scaffolds a production-grade harness in one agent session. Use it if blank templates feel too abstract.

Step 4: Run Project 01 (optional but worth it)

Clone the repo if you have not already. Open projects/project-01/ in the docs: Baseline vs Minimal Harness.

Work inside projects/project-01/starter/ with your agent. Run the same task twice: prompt-only first, then rules-first with the harness files. Compare how often the agent declares victory before tests pass.

Success: you have a side-by-side note on what changed when the repo carried state and scope, not just a chat prompt.

Step 5: Follow the full path if you want the system

The course order is fixed: 12 lectures, 6 projects, each project building on the last. All six projects evolve one Electron personal knowledge-base app (import docs, index, citation Q&A). Project 06 capstone adds benchmark scripts, cleanup scanner, and harness ablation (remove one subsystem at a time and measure what breaks).

Docs: walkinglabs.github.io/learn-harness-engineering/en/
Repo: github.com/walkinglabs/learn-harness-engineering

Optional offline: npm run pdf:build inside the cloned repo writes PDF coursebooks to artifacts/pdfs/.

Cleanup

No cloud resources to tear down. If you cloned only for templates, delete the clone when done:

rm -rf learn-harness-engineering

Keep the four harness files in your actual project.

If it breaks: Agent still skips tests? Add an explicit “cannot mark done until X passes” line in AGENTS.md and wire init.sh to run your test suite. Agent loses context between sessions? You are missing claude-progress.md updates or session handoff notes (Lecture 05–06). Audit script fails on a Hugo or docs-only repo? That is expected for some checks; fix what applies, ignore template fields that assume a Node app.