Role-Specialized Agentic Coding: 90% Cost Reduction via SDM/SDE Loop

The Waste in Single-Model Agentic Coding

The standard approach to agentic code generation is simple: give one capable model a task, let it plan, implement, and self-correct. This works, but it is economically backwards. You are paying frontier-model prices for every token the model generates — including the 80% of output that is repetitive scaffolding, boilerplate, and verbose self-correction.

Consider GPT-5.3 Codex generating a feature. The model might output 2,000 tokens of code. But before that, it outputs 1,500 tokens of reasoning. Then it spots an error and outputs 800 more tokens of self-correction. All of this is billed at the expensive output token rate. The ratio of valuable output to total output is often below 30%.

This is the core insight behind role specialization: not all tokens are created equal. Planning tokens require frontier-model intelligence. Implementation tokens do not — they are largely mechanical translation of a specification into code. Review tokens require frontier-model judgement again, but review output is naturally short.

Key insight: In a single-model loop, >60% of output tokens are generated at the expensive frontier-model rate but do not require frontier-level capability. Role specialization routes each type of cognitive work to the appropriately priced model.

The SDM/SDE Architecture

We designed a two-agent code generation loop modelled on the classic engineering management structure:

┌─────────────────────────────────────────────────────────┐ │ SDM (GPT-5.3 Codex) │ │ Senior Development Manager / Planner │ │ Role: Plan architecture → Review code → Give feedback │ │ Input: Expensive (full context) │ │ Output: Minimal (plan doc + short review comments) │ │ Cost: ~$X per call (mainly input tokens) │ └────────────────────┬────────────────────────────────────┘ │ Plan document │ Review feedback ▼ ┌─────────────────────────────────────────────────────────┐ │ SDE (DeepSeek 4 Flash) │ │ Software Development Engineer / Implementer │ │ Role: Read plan → Generate code → Write review document│ │ Input: Cheap (plan + spec) │ │ Output: Heavy (full implementation) │ │ Cost: ~$0.1X per call (cheap for heavy output) │ └─────────────────────────────────────────────────────────┘

The Loop

Step	Agent	Action	Output
1	SDM	Analyse requirements, create architecture plan	Plan document (structured, ~300 tokens)
2	SDE	Read plan, generate code, write to review doc	Code files + review document (~2000 tokens)
3	SDM	Read review document, perform code review	Feedback document (~200 tokens)
4	SDE	Improve code based on feedback	Updated code + review document
5–6	Repeat steps 3–4 until SDM approves or max 3 rounds		Final approved code

How the Cost Saving Works

The economics hinge on one property of LLM pricing: input tokens are much cheaper than output tokens, and the gap widens at the frontier. For GPT-5.3 class models, output tokens cost ~4–6x more than input tokens. For DeepSeek 4 Flash, both input and output are an order of magnitude cheaper.

The SDM model (expensive) is instructed to produce extremely concise output. Its job is to think, plan, and critique — all of which happens primarily in its internal reasoning (input processing). Its external output is minimised to a structured plan and short review bullets. It never writes code directly.

The SDE model (cheap) does the heavy lifting. It reads the plan, generates all the implementation code, writes it to the review document, and iterates based on feedback. Its output tokens are billed at the cheap rate — typically 10–20x cheaper per token than the frontier model.

The asymmetry: The expensive model produces ~500 output tokens per task (plan + 2–3 rounds of feedback). The cheap model produces ~6,000–8,000 output tokens per task (implementation + 2–3 rounds of improvement). At the pricing differential, the expensive model accounts for only 10–20% of total cost despite using a far more expensive model.

Cost Model: Single-Model vs Two-Agent

Consider a real feature: "Add a paginated user listing endpoint with search and filter." This touches 3–5 files (controller, service, repository, tests) and produces ~300 lines of code. Here's the cost breakdown with realistic token counts.

Pricing assumptions used: GPT-5.3 Codex at $2.50/1M input and $150/1M output (input is inexpensive relative to frontier code model output). DeepSeek 4 Flash at $0.25/1M input and $1/1M output. These reflect observed API pricing for premium code models vs efficient open-weight alternatives.

Metric	Single Model (GPT-5.3 only)	Two-Agent (SDM + SDE)
Expensive model input tokens	25,000 (requirements + codebase context)	40,000 (plan + 2 × reviews)
Expensive model output tokens	12,000 (reasoning + 300 lines of code)	1,200 (plan + feedback, concise)
Cheap model tokens	—	30,000 input + 12,000 output
Total cost per task	$1.86	$0.30
Cost reduction	—	~84%

At 2,000 tasks per month (a team automating feature development, bug fixes, and test generation):

Single-model approach: ~$3,720/month
Two-agent approach: ~$600/month
Annual savings: ~$37,440

These figures align with our production measurements, where we consistently observe 80–90% cost reduction depending on task complexity and the number of review rounds required.

Why Quality Doesn't Degrade

The natural concern is that using a cheaper model for implementation reduces code quality. In practice, the opposite occurs:

The SDM's plan constrains the SDE's output space. A well-structured plan with explicit interfaces, data models, and acceptance criteria leaves little room for the SDE to diverge. The SDE becomes a mechanical translator of specification to code — a task at which smaller models excel.
The review loop catches SDE errors. The SDM, reading the review document, spots deviations from the plan, suboptimal patterns, and missing edge cases. The SDE then fixes them. This is arguably better than a single model self-correcting, because the reviewer and implementer are independent systems with different failure modes.
The cheap model can be pushed harder. Because SDE costs are low, we can afford multiple generation attempts, more thorough test generation, and longer output. The economic constraint that makes single-model systems produce minimal output doesn't apply to the SDE.

Implementation Details

The Plan Document

The SDM maintains a structured plan document with these sections:

Architecture: Component breakdown, data flow, module boundaries
Interfaces: Function signatures, type definitions, API contracts
Data model: Schemas, validation rules, persistence strategy
Acceptance criteria: Exact behaviours the implementation must satisfy
Rejected approaches: What not to do, known pitfalls

The Review Document

The SDE writes each file it creates or modifies into a review document, alongside a summary of what was done and why. This document becomes the SDM's input for review. The SDM never reads raw code files — it reads the structured review document, which provides context and intent alongside the code.

Loop Termination

The loop runs a maximum of 3 rounds. In practice, ~70% of tasks pass review in round 1, ~25% require one revision, and ~5% require two. The third round is a safety net. After 3 rounds, the SDM either approves or flags the task for human review.

We track the pass rate per round as a health metric. A declining round-1 pass rate signals that the plan document is insufficiently detailed, and the SDM's prompting should be adjusted.

Round	Pass Rate	Cumulative % of Tasks
1 (initial implementation)	70%	70%
2 (after first feedback)	83%	95%
3 (after second feedback)	60%	98%
Escalated to human	—	2%

Beyond Cost: Additional Benefits

The role-specialized architecture produces advantages beyond the direct cost savings:

Audit trail. The plan document, review document, and feedback are all persisted. Every decision is traceable. A human can review the entire conversation at any point.
Independent iteration. The SDM and SDE can operate asynchronously. The SDM can plan the next task while the SDE implements the current one. With parallel task queues, throughput nearly doubles.
Model independence. The SDM and SDE are decoupled by the document interface. Either model can be swapped independently. We have run experiments with Claude Opus 4.5 as SDM and Qwen 3 Coder as SDE, with similar cost profiles.
Incremental improvement. Because the SDE's output is always reviewed by a stronger model, the system naturally drives quality up over time. The SDE learns from feedback patterns within each session.

Conclusion

The prevailing assumption in agentic coding is that you need the most capable model for every part of the workflow. This is economically wrong. By splitting the cognitive labour — planning and reviewing to the frontier model (minimal output), implementation to a cheap model (heavy output) — we achieve the same or better quality at 80–90% lower cost.

The SDM/SDE pattern is not a compromise. It is a principled application of comparative advantage to AI agents. Each model does what it does best. The expensive model thinks. The cheap model builds. The loop connects them.

This architecture is now the default for all code generation tasks at Northorp. We recommend it to any team generating more than 5,000 lines of agent-written code per month — the savings pay for the integration effort within the first week.

The Waste in Single-Model Agentic Coding

The SDM/SDE Architecture

The Loop

How the Cost Saving Works

Cost Model: Single-Model vs Two-Agent

Why Quality Doesn't Degrade

Implementation Details

The Plan Document

The Review Document

Loop Termination

Beyond Cost: Additional Benefits

Conclusion

Build Your Agentic Coding Pipeline