Spec-Driven Development with AI · 2026 Playbook

Six months ago, "vibe coding" was the meme that captured the moment — type a sentence, watch Claude or Cursor produce a working feature, ship it, repeat. It's fun. It's genuinely productive for prototypes. And it falls apart the second you put it inside a real codebase with real reviewers, real tests, and real users. We've spent the first half of 2026 watching teams collide with that wall — including a few we were brought in to clean up after. The pattern that actually works at scale has a name now: spec-driven development.

The shift in one line

Stop prompting for code. Start prompting for a spec, then let the agent generate the code, the tests AND the verification — and review the spec, not every line.

Why vibe coding stalls in real codebases

On a greenfield project of 5 files, you can hold the whole thing in your head and eyeball the diff. On a codebase of 500 files with 4 developers, three problems compound fast:

Reviewers can't tell what the AI was trying to do, only what it did — so review either becomes rubber-stamping or rewriting from scratch
Tests get written as an afterthought (or by the same prompt that wrote the code, which is its own circular trap)
Architectural drift sets in within weeks — every feature is locally plausible, globally inconsistent

The fix isn't "use a smarter model." The fix is changing what the human writes and what the agent produces.

The spec-driven loop, in four steps

1Write a short spec — what changes, why, and what observable behaviour proves it works. 10–40 lines of markdown, not a 20-page PRD
2Agent drafts the plan — files to touch, tests to add, migration steps. Human approves or edits the plan, not the code
3Agent implements + runs verification — generates the code, runs the test suite, captures the output. If verification fails, agent iterates inside its own loop until it passes
4Human reviews the spec-to-diff delta — does the diff actually do what the spec promised? Anything outside the spec is a red flag

Notice what the human is doing: writing intent, approving plans, reviewing against intent. Notice what the human is not doing: typing implementation, debugging compile errors, hand-writing tests for trivial logic. That's where the productivity actually comes from in 2026 — not from typing faster, but from moving up a level of abstraction.

What a real spec looks like

Example spec (from a recent CruxBit engagement)

"Add per-organisation rate limiting to the public API. 1000 requests / minute / org, sliding window, Redis-backed. On limit hit: return 429 with Retry-After header. Must not affect requests on the authed dashboard endpoints. Verification: integration test that fires 1001 requests and asserts the 1001st returns 429; existing dashboard tests still pass."

That's the whole thing. Six sentences. The agent now has everything it needs to draft a plan, ask clarifying questions (which it usually does), implement, and verify. The reviewer has a fixed bar to measure the diff against — "does this implement the spec, and only the spec?"

Tooling that makes this practical

You don't need a new product to do spec-driven development — but a few things help enormously:

An agent that can actually run the code — Claude Code, Cursor Agent, Devin, or your own [[mcp-explained-2026|MCP-wired]] setup. Verification-in-the-loop is the entire point
A specs/ folder in the repo — every non-trivial change lands with the spec it was built from, in markdown, alongside the code. Future archaeology gets much easier
A pre-merge checklist tied to the spec — "does the diff match the spec?" as an explicit reviewer step, ideally with the spec inline in the PR description
Evals where it matters — for AI-built AI features especially, but also for any agent-heavy refactor: a small eval suite catches the "works locally, broken under real input" class of regression

What changes for reviewers

The honest part of this transition is that review gets weirder before it gets better. You stop reading every line and start reading the spec, scanning the diff for surprises, and trusting the test output. That feels reckless the first few times. The teams who push through it ship faster within a sprint or two. The teams who don't end up either rubber-stamping (which is worse than the old way) or re-typing the AI's output (which is just slow vibe coding).

Heuristic that works

If the diff contains anything you can't map back to a sentence in the spec, ask the agent to either remove it or amend the spec to justify it. Out-of-scope creep is the single biggest source of AI-generated tech debt.

Failure modes we see often

1Specs that are too vague — "improve the checkout flow" is a wish, not a spec. The agent will hallucinate scope and the reviewer has no fixed bar
2Specs without a verification clause — without "how do we know it worked," the agent has no termination condition and the human has no acceptance criterion
3Skipping the plan step — letting the agent jump from spec to diff hides the architectural choices. The plan is where the cheap iteration happens
4Letting the agent write the spec AND the code — circular grading. The spec is the human's job; only the human knows what the product is supposed to do

Where this lands by end of 2026

Our prediction, watching the tooling roadmaps: by Q4, "spec" becomes a first-class artifact in PR workflows the same way "description" is today. GitHub, Linear, Cursor and Claude Code are all converging on this — spec-in, code-out, with the spec preserved alongside the merged change. Teams that adopt the loop early will be reviewing 3x the PRs in the same time, with better defect rates, because the review surface area is smaller and more semantic.

TL;DR

Vibe coding is great for demos, brittle in production codebases
Spec-driven development = short markdown spec → plan → AI implements + verifies → human reviews diff vs spec
Humans write intent and approve plans. Agents write code, tests and verification
Keep specs in-repo alongside the code they shipped — future readers will thank you
Out-of-scope diff content is the smell that tells you the loop is broken
The productivity win is moving up an abstraction level, not typing faster

We've been rolling out spec-driven workflows for client teams since the start of the year. If you're trying to figure out how to get your engineers from vibe-coding-in-IDE to shipping-AI-code-at-team-scale, drop us a paragraph about your stack — we'll send back a candid take on the smallest change that gets you the biggest lift.

#AI Coding#Workflow#Architecture#Spec-Driven

Back to all posts

Up next

Healthcare

Building something we've just written about?

Drop us a line. We respond within 24 hours with a candid, no-pressure take on whether we're the right partner.

Start a conversation Read more posts

Spec-driven development: how teams actually ship AI-written code in 2026