Learn Harness Engineering: Make AI Coding Agents Actually Reliable
By the end you will either have a minimal agent harness in your own repo (four files, one audit pass) or a clear path through the full Learn Harness Engineering curriculum. Quick path: 15 minutes. Full course with projects: a few weekends part-time.
TLDR:
- WalkingLabs teaches harness engineering: the repo environment that makes Codex and Claude Code reliable, not smarter prompts alone.
- Five subsystems: instructions, state, verification, scope, session lifecycle.
- Copy templates from the Resource Library (
AGENTS.md,feature_list.json,claude-progress.md,init.sh). - Audit any repo with
audit-harness.sh(no Node required). - Six projects build the same Electron knowledge-base app while you add harness layers; capstone includes benchmark and ablation scripts.
Prerequisites: A real repo you use with Claude Code, Cursor, or Codex. For hands-on projects: Git, Node.js, and an agent CLI. No paid course account; everything is free on GitHub and the docs site (15 languages).
Step 1: Read the core idea (10 minutes)
Open Lecture 01 and Lecture 02.
Success looks like you can explain this in one sentence: the model writes code; the harness governs when, where, how, and when “done” is allowed. Anthropic’s long-running agent experiments and OpenAI’s Codex harness writeups are the cited sources. Same model, weak harness vs strong harness is the difference between cleanup duty and review duty.
Step 2: Drop the minimal harness into your project
Browse the English Resource Library. Download or copy these into your project root:
your-repo/
├── AGENTS.md # operating manual for the agent
├── feature_list.json # scope: features and done/not-done
├── claude-progress.md # session log (rename if you use Codex)
└── init.sh # install + verify before work startsAdapt names to your stack (CLAUDE.md if you prefer Claude Code conventions). Fill feature_list.json with real features, not placeholders. Point AGENTS.md at your test command and definition of done.
Success: your next agent session starts by reading these files instead of guessing repo rules from scratch.
Step 3: Audit what you have
curl -fsSL https://raw.githubusercontent.com/walkinglabs/learn-harness-engineering/main/tools/audit-harness.sh | bash -s -- /path/to/your/repoOr clone the course repo and run locally:
git clone https://github.com/walkinglabs/learn-harness-engineering.git
bash learn-harness-engineering/tools/audit-harness.sh /path/to/your/repoSuccess: script exits 0 when critical harness items pass. Failures list missing files or weak spots (no progress log, no verification hook, etc.).
Pro tip: The repo also ships a
skills/harness-creator/skill that scaffolds a production-grade harness in one agent session. Use it if blank templates feel too abstract.
Step 4: Run Project 01 (optional but worth it)
Clone the repo if you have not already. Open projects/project-01/ in the docs: Baseline vs Minimal Harness.
Work inside projects/project-01/starter/ with your agent. Run the same task twice: prompt-only first, then rules-first with the harness files. Compare how often the agent declares victory before tests pass.
Success: you have a side-by-side note on what changed when the repo carried state and scope, not just a chat prompt.
Step 5: Follow the full path if you want the system
The course order is fixed: 12 lectures, 6 projects, each project building on the last. All six projects evolve one Electron personal knowledge-base app (import docs, index, citation Q&A). Project 06 capstone adds benchmark scripts, cleanup scanner, and harness ablation (remove one subsystem at a time and measure what breaks).
Docs: walkinglabs.github.io/learn-harness-engineering/en/
Repo: github.com/walkinglabs/learn-harness-engineering
Optional offline: npm run pdf:build inside the cloned repo writes PDF coursebooks to artifacts/pdfs/.
Cleanup
No cloud resources to tear down. If you cloned only for templates, delete the clone when done:
rm -rf learn-harness-engineeringKeep the four harness files in your actual project.
If it breaks: Agent still skips tests? Add an explicit “cannot mark done until X passes” line in AGENTS.md and wire init.sh to run your test suite. Agent loses context between sessions? You are missing claude-progress.md updates or session handoff notes (Lecture 05–06). Audit script fails on a Hugo or docs-only repo? That is expected for some checks; fix what applies, ignore template fields that assume a Node app.
Related TMFNK Content
- Karpathy’s AI Coding Guidelines: Install the CLAUDE.md That Developers Use Behavioral guardrails for agents; harness files add state, scope, and verification on top.
- Hugo Agent Readiness Playbook: From Score 8 to 83 Same problem domain for this site: make a repo legible and safe for agents.
- Superpowers: An Agentic Skills Framework for Coding Agents Disciplined agent workflows (TDD, debugging); harness engineering is the repo infrastructure those workflows assume.
Crepi il lupo! 🐺