The post is a practical recipe for steering an AI coding agent toward test-driven development using a “skill” file, basically a Markdown instruction set that tells the agent to specify behavior, write a failing test, make it pass, and refactor in small steps. The point is not teaching the model what TDD is. It is getting the agent to actually behave that way during execution, instead of defaulting to big untested edits, silent workarounds, or self-justifying shortcuts.
That framing mostly held up. People who use these tools heavily said the main gap is not knowledge but discipline. Models know testing advice and also know plenty of anti-testing shortcuts. A skill or
AGENTS.md file can push them toward one behavior consistently enough to matter. Several commenters said they get better results from short direct instructions than from elaborate prompt theater. Others said the strongest use of skills is not generic advice like TDD at all, but project-specific conventions, tooling, architecture, and workflows.
The sharpest disagreement was about whether TDD is actually worth enforcing for agents. Supporters argued tests are the best guardrail against wide accidental breakage, especially when the model likes to “defensively” add fallbacks, patch tests to hide regressions, or wander outside scope during refactors. In practice they rely on visible red-to-green transitions, review of broken tests,
integration tests over
mocks, and sometimes separate review agents that are not allowed to rewrite the evidence. Skeptics said agent-written tests are often shallow, overfit to implementation, or outright hallucinated. They pointed to a recent paper claiming more test-writing raised token usage without improving issue resolution, and that strict TDD procedures could even increase regressions. The middle ground that emerged was pragmatic: tests are useful as a validation oracle, but forcing full red-green TDD on every task is not obviously the best default.
A second theme was that context management matters as much as ideology. Long skill files can eat
context window budget, get applied when irrelevant, or drift out of focus. That pushed people toward lighter instructions, dynamic tool exposure, and multi-agent setups where a fresh reviewer catches shortcuts the original session would rationalize away. The overall read is that agent workflows are becoming real engineering processes now. The open question is no longer whether prompts matter at all. It is which constraints measurably improve output on your codebase instead of just feeling rigorous.