Case study · Recursive integrity · Cert confirmed

The bridge that selects what feeds the validator — audited by the validator

Fourteen iterations of architect.validate against the orchestration code that bundles applicant repositories for review. The validator caught real gaps in its own delivery pipeline at every pass. The cert reviewer confirmed the production_ready verdict on iter14. Each readiness-review URL below is public — every score, every verdict, every recommendation is replayable.

Key Facts

Validator iterations: 14 prod-MCP runs
Score trajectory: 35/F → 100/A
Cert outcome: confirmed_production_ready
Missed defects: 0 (cert second-pass)

Live cert receipt

Iter14 production_ready, cert second-pass confirmed

The receipt is real, not a demo. The badge below links to the iter14 readiness review with the full per-principle verdict, audit chain, and the cert reviewer's persisted summary. Every prior iteration is linked in the trajectory table.

Before, Iter1 baseline

7ef168b8-b9c0-40f0-8dc4-b25e36ae3a09

After, Iter14 cert-confirmed

3ac16b20-88b8-4448-a4f6-5aa738b2919b

View full readiness review →

The claim, stated narrowly

AI Design Blueprint's `architect.validate` tool grades agentic code against the 10-principle doctrine. The cohort-bridge orchestrator is the code that converts an applicant's public GitHub repository into validate-compatible input. That orchestrator was submitted to the validator it serves, fourteen times in one evening — first-pass validate plus second-pass cert review on the final iteration.

Layer 1: validator scores its own source (cert-confirmed self-audit, 13 prior rounds). Layer 2: validator scores a governed agent (canary baseline 100/A clean, 18/F injected). Layer 3: validator scores the bridge that selects what to feed it — this case study. Three layers, each auditing the layer below, each with public receipts.

Cert-confirmed receipt

Iter14 second-pass cert summary, verbatim from the persisted review:

CERT CONFIRMED
The first-pass production_ready verdict is confirmed: the code shows durable job states, explicit terminal/blocked/aborted handling, retry paths, and no specific missed defect that would currently cause a real user silent wrong results, crash, or trust-boundary bypass.
architect.certify · second-pass adversarial review · 3ac16b20…

Trajectory across fourteen iterations

Click any iteration to open its public readiness review. The validator's verdicts, per-principle reasoning, and severity scores are rendered live from the persisted run.

Iter	Change	Score	Tier	Run
Iter1	Initial bridge submission	35 / F	DRAFT	7ef168b8-b9c0-40f0-8dc4-b25e36ae3a09
Iter2	P5 docstring fix + audit-chain metadata	30 / F	DRAFT	86b2c59d-0f04-4b30-a891-796c4276fead
Iter3	CohortValidationJob state machine + P7 boundary	67 / C	EMERGING	e476247c-d9c0-44f4-a25f-dbbdb7eb7b15
Iter4	P6 enum + P8 rollback + P10 retry_count	74 / C	EMERGING	659a695a-2bf6-4b28-a52f-b6098c20ee0b
Iter5	Typed file envelope + onboarding state machine	74 / C	EMERGING	3f3bb587-5ad1-4417-b624-23441b39831e
Iter6	Selector priority inversion + Firebase wrap	74 / C	EMERGING	14a3456f-44db-448a-a47b-5e9637944ea6
Iter7	OPENAI_API_KEY guard before validation_queued	74 / C	EMERGING	8364019d-268c-4b46-aacb-e929fadd0c36
Iter8	Orchestrator-level app onboarding mirror	98 / A	PRODUCTION_READY	270e7ca6-5af3-44dd-b55f-8b78e265ae34
Iter9	Full package + db/schemas in scope	98 / A	PRODUCTION_READY	4128f700-ff4e-41e0-af12-3e56f5b54a9a
Iter10	All import stubs for cert coverage	98 / A	PRODUCTION_READY	093809b5-b30a-4b23-b02c-30a308ee7dea
Iter11	Base import restored after cert finding	74 / C	EMERGING	9caf9385-cd3b-4936-b455-87a916577e3a
Iter12	Durable steering CLI + recoverable approve()	74 / C	EMERGING	4760459e-5af0-4927-834c-9fac4c5c3bd2
Iter13	Step-aware recovery + terminal-status guard	74 / C	EMERGING	0e49f888-1b71-4cb2-bb86-52327681b997
Iter14	Post-validate abort + preserve failed-step state	100 / A	CERT CONFIRMED	3ac16b20-88b8-4448-a4f6-5aa738b2919b

What the trajectory reveals

Score-as-snapshot misses the iteration story. The bridge hit production_ready three times (iter8, iter9, iter10) before iter11 regressed to emerging on identical code — a single LLM-variance event that promoted P10 from hardening_recommended to production_blocker. The fix path then required four more iterations: durable steering primitives (iter12), step-aware recovery + terminal-status guards (iter13), and the post-validate abort re-check that closed P10 definitively (iter14).

The cert reviewer's iter10 downgrade caught a real bug — but in the submission discipline, not the underlying code. The excerpt sent to validate was missing `from app.db.session import Base`. The actual repo file always had it. The cert reviewer's verdict prose said so explicitly. Iter11 restored the import; the regression that immediately followed was the variance event, not a code change.

The doctrine the validator applies is the same doctrine that runs inside Claude Code via MCP. Every recommendation in every iteration is mechanism-specific (named identifiers, named files, named race conditions). None are template-generic. The iter14 cert summary names which terminal states and retry paths confirm the verdict.

Verbatim from iter2 — the P7 finding the iter1 review hadn't reached

DRAFT
The audit envelope is a substantial improvement: `audit_object` records `schema_version`, `captured_at`, source `repo_url`/`commit_sha`, selected file hashes, skipped reads, `bundle_sha256`, validator latency, and shape-only log signals. However, the bridge still constructs the validator payload by concatenating raw untrusted repo content in `_build_implementation_context()` via `chunks.append(f"# === FILE: {path} ===\n{content}\n")` and then sends it as `ValidationRequest(code=bundle, ...)`. This file does not establish or record an inert untrusted-input boundary, so repo comments or filenames containing instruction-like text could steer the validator unless a hidden downstream service compensates.
architect.validate · iter2 verdict on P7 Inspectability · production_blocker · 86b2c59d…

Verbatim from iter14 — P10 steering, finally aligned

CERT CONFIRMED
Steering primitives are now present for both validation and onboarding recovery. `request_abort()` persists `abort_requested=True`; `_execute_with_job()` checks for abort before validation and, in the Iter14 change, refreshes the job and re-checks `job.abort_requested` immediately after `asyncio.run(validate_code_against_principles(...))` returns and before creating `UserValidationRun` or calling `mark_completed()`, discarding validator output on abort. `retry_failed_validation_job()` requires the latest job to be terminal and `retry_eligible`, while `retry_onboarding_handoff()` preserves the failed state (`firebase_user_failed`, `sign_in_link_failed`, or `approval_email_failed`) and clears only `onboarding_failure_reason`, allowing `approve()` to re-enter step-aware recovery.
architect.validate · iter14 verdict on P10 Steering · aligned (sev 60 → 0) · 3ac16b20…

What the bridge is now

An end-to-end orchestration surface that converts an arbitrary public GitHub repository into a validator-compatible bundle, with durable workflow state, typed file-envelope boundaries, step-aware operator recovery, post-validate abort honoring, and a cert-confirmed production_ready badge that any reader can verify by re-fetching the run_id. The first 10 cohort applicants this week will be onboarded through this code.

Three case studies, three facets

This is the self-validation facet of the recursive-integrity triad. The other two facets are public: the substrate-validation scan on Anthropic's own agent-SDK demo code, and the reference-honesty scan on our own A2A protocol example. Same doctrine, three code surfaces, three different verdicts.

Anthropic substrate scan →A2A reference agent →

Receipts

Every claim above is replayable from the public run URLs. The 10-principle doctrine fingerprint stayed constant across all fourteen iterations (validator confirmed `same_doctrine` on every baseline comparison). No mid-stream rule changes; the same rubric scored every run.

Re-validation result

After Iter14: architect.certify confirmed production_ready

The iter14 implementation was validated and then certified in the same prod-MCP session. Cert outcome: confirmed_production_ready. The badge is live and the readiness review is publicly inspectable.

Iter1 (before)

Draft · 35/F

Initial submission · no closures

Iter14 (after)

Production-ready · Cert confirmed

100/A · 0 missed defects

Time to fix

Fourteen iterations

From 35/F draft to confirmed_production_ready

View the live readiness review →

Calculated ROI

Same metrics, same calculator powering every case study

Derived deterministically from this case study's profile (14 iterations, code-modifying blast radius, autonomous workflow, under compliance) via /lib/case-study-roi.ts. Numbers directly comparable to the other case studies.

Senior-architect time replaced

~274 hours @ $150/hour ≈ ~$41K per agent

Production ROI per agent / year

$80K – $200K (incident prevention + audit prep + rework)

Time to identify the governance gaps

2-4 weeks of senior-architect review WITHOUT Blueprint, ~70 min / 14 validator passes WITH Blueprint

Incidents prevented (range)

5-15 per year of unintended production source-code changes (each ~4-40 hours of incident-response / rollback)

Compliance audit prep

~80-120 hours / year replaced with one audit query

Related, Pro / Teams

Run this as a Blueprint Readiness Score

The Architect Agent is the same review pattern shown in this case study, applied to your code. Call architect.validate to get a Blueprint Readiness Score (0–100, A–F) per repository, and a regression diff between runs so the next review focuses on what changed.

Explore the Architect Agent See pricing

Sample score card

82/ 100

Production-ready

▲ 7

acme/customer-agent

Run your own validation

Paste your agent code or describe your workflow. The validator returns principle-by-principle findings, a readiness score, and a shareable review URL in seconds. Reach 80+/A and cert mints a public badge that matches the one above.

Open the validator Read the 10 principles

Other case studies