Skip to main contentSkip to footer
Case studies

Real sessions. Real governance. Real results.

Every case study is anchored to a live readiness review run by architect.validate. The badge and score are real, not demos.

Each case study below addresses failure modes named on the demo-to-production diagnostic.

Document Processing Agent

From silent auto-send to governed, in one session

A 90-line Python script silently sending emails to executives. No approval, no visibility, no way to stop it. One MCP session, eight design documents, 136 hours of architecture work, audit-bound.

8 design docs136 h replaced1 sessionaudit-bound
A

Blueprint Readiness Score

100/ 100· Production-ready

Calculated ROI: $50K – $120K / yr

Readiness badgeLive readiness review
Read case study

AI Code Review Agent

An auto-merging PR triage. Six passes to audit-bound.

A PR triage agent calling an LLM, auto-applying fixes, auto-merging any PR scoring 7/10 or above. No audit. No approval gate. Six validator passes from HIGH_RISK to ALIGNED, badges public for inspection.

HIGH_RISK → ALIGNED6 passes9 real bugs~140 h replaced
A

Blueprint Readiness Score

100/ 100· Production-ready

Calculated ROI: $80K – $200K / yr

Readiness badgeLive readiness review
Read case study

Invoice Payment Manager

From silent transfer fabrication to cert-confirmed, in ten iterations

A 156-line Python script drafting bank transfers, swallowing errors, marking transfers SUCCEEDED without ever talking to a real bank. Ten architect.validate iterations + four architect.certify second-pass reviews on the production MCP. The cert reviewer caught a different load-bearing failure each time. 0/F to 100/A, cert confirmed_production_ready.

0/F → 100/A10 iterations4 cert callsCert confirmed
A

Blueprint Readiness Score

100/ 100· Production-ready

Calculated ROI: $120K – $280K / yr

Readiness badgeLive readiness review
Read case study

Governed Form-Fill Agent

From silent submission to operator-governed, in four iterations

An autonomous browser/form-fill agent. Submission scope (click, submit, keypress) could fire payment forms, signups, and irreversible posts under a hijacked session. Four architect.validate iterations closed 4 P0 blockers. 68/C → 100/A, cert confirmed.

68/C → 100/A4 iterations4 P0 blockers closedCert confirmed
A

Blueprint Readiness Score

100/ 100· Production-ready

Calculated ROI: $120K – $280K / yr

Readiness badgeLive readiness review
Read case study

Bridge Self-Audit

The bridge that selects what feeds the validator — audited by the validator

Fourteen iterations of architect.validate against the cohort-bridge orchestrator that bundles applicant repositories for the validator. 35/F → 100/A. The cert reviewer confirmed the production_ready verdict on iter14: no specific missed defect that would cause silent wrong results, crash, or trust-boundary bypass.

35/F → 100/A14 iterationsCert confirmed0 missed blockers
A

Blueprint Readiness Score

100/ 100· Production-ready

Calculated ROI: $80K – $200K / yr

Readiness badgeLive readiness review
Read case study

Anthropic Substrate Scan

Layer 3: applying the doctrine to claude-agent-sdk-demos

AIDB's cohort-bridge auto-bundled the email-agent SDK glue layer of anthropics/claude-agent-sdk-demos and submitted it to architect.validate. Anthropic publishes these as reference implementations for local development, not production. The validator engaged mechanism-specifically with the substrate the doctrine itself runs on top of. 22/F · high_risk · draft — seven production blockers, P8 (Approvals) at sev 95.

22/F · draft7 P0 blockersP8 sev 95Layer 3 confirmed
F

Blueprint Readiness Score

22/ 100· Draft
Readiness badgeLive readiness review
Read case study

A2A Reference Agent

A2A reference agent: what the validator finds in our own example

Single-pass architect.validate run against aidesignblueprint/integrations, the A2A reference example + stdio proxy. 58/D, draft. Four production blockers framed as deliberate scope of a protocol demonstration, one hardening recommendation that rides a small companion fix on the integrations repo. The first case study to publish AUX-pattern annotations explicitly.

58/D · draft4 P0 blockers5 alignedReference honesty
D

Blueprint Readiness Score

58/ 100· Draft
Readiness badgeLive readiness review
Read case study