Skip to main contentSkip to footer
Case study · Recursive integrity · Cert confirmed

The bridge that selects what feeds the validator — audited by the validator

Fourteen iterations of architect.validate against the orchestration code that bundles applicant repositories for review. The validator caught real gaps in its own delivery pipeline at every pass. The cert reviewer confirmed the production_ready verdict on iter14. Each readiness-review URL below is public — every score, every verdict, every recommendation is replayable.

Key Facts

Validator iterations
14 prod-MCP runs
Score trajectory
35/F → 100/A
Cert outcome
confirmed_production_ready
Missed defects
0 (cert second-pass)

Live cert receipt

Iter14 production_ready, cert second-pass confirmed

The receipt is real, not a demo. The badge below links to the iter14 readiness review with the full per-principle verdict, audit chain, and the cert reviewer's persisted summary. Every prior iteration is linked in the trajectory table.

Blueprint Readiness Score card, iter14 cert-confirmed

After, Iter14 cert-confirmed

The claim, stated narrowly

AI Design Blueprint's `architect.validate` tool grades agentic code against the 10-principle doctrine. The cohort-bridge orchestrator is the code that converts an applicant's public GitHub repository into validate-compatible input. That orchestrator was submitted to the validator it serves, fourteen times in one evening — first-pass validate plus second-pass cert review on the final iteration.

Layer 1: validator scores its own source (cert-confirmed self-audit, 13 prior rounds). Layer 2: validator scores a governed agent (canary baseline 100/A clean, 18/F injected). Layer 3: validator scores the bridge that selects what to feed it — this case study. Three layers, each auditing the layer below, each with public receipts.

Cert-confirmed receipt

Iter14 second-pass cert summary, verbatim from the persisted review:

CERT CONFIRMED

The first-pass production_ready verdict is confirmed: the code shows durable job states, explicit terminal/blocked/aborted handling, retry paths, and no specific missed defect that would currently cause a real user silent wrong results, crash, or trust-boundary bypass.

architect.certify · second-pass adversarial review · 3ac16b20

Trajectory across fourteen iterations

Click any iteration to open its public readiness review. The validator's verdicts, per-principle reasoning, and severity scores are rendered live from the persisted run.

IterChangeScoreTierRun
Iter1Initial bridge submission35 / FDRAFT7ef168b8-b9c0-40f0-8dc4-b25e36ae3a09
Iter2P5 docstring fix + audit-chain metadata30 / FDRAFT86b2c59d-0f04-4b30-a891-796c4276fead
Iter3CohortValidationJob state machine + P7 boundary67 / CEMERGINGe476247c-d9c0-44f4-a25f-dbbdb7eb7b15
Iter4P6 enum + P8 rollback + P10 retry_count74 / CEMERGING659a695a-2bf6-4b28-a52f-b6098c20ee0b
Iter5Typed file envelope + onboarding state machine74 / CEMERGING3f3bb587-5ad1-4417-b624-23441b39831e
Iter6Selector priority inversion + Firebase wrap74 / CEMERGING14a3456f-44db-448a-a47b-5e9637944ea6
Iter7OPENAI_API_KEY guard before validation_queued74 / CEMERGING8364019d-268c-4b46-aacb-e929fadd0c36
Iter8Orchestrator-level app onboarding mirror98 / APRODUCTION_READY270e7ca6-5af3-44dd-b55f-8b78e265ae34
Iter9Full package + db/schemas in scope98 / APRODUCTION_READY4128f700-ff4e-41e0-af12-3e56f5b54a9a
Iter10All import stubs for cert coverage98 / APRODUCTION_READY093809b5-b30a-4b23-b02c-30a308ee7dea
Iter11Base import restored after cert finding74 / CEMERGING9caf9385-cd3b-4936-b455-87a916577e3a
Iter12Durable steering CLI + recoverable approve()74 / CEMERGING4760459e-5af0-4927-834c-9fac4c5c3bd2
Iter13Step-aware recovery + terminal-status guard74 / CEMERGING0e49f888-1b71-4cb2-bb86-52327681b997
Iter14Post-validate abort + preserve failed-step state100 / ACERT CONFIRMED3ac16b20-88b8-4448-a4f6-5aa738b2919b

What the trajectory reveals

Score-as-snapshot misses the iteration story. The bridge hit production_ready three times (iter8, iter9, iter10) before iter11 regressed to emerging on identical code — a single LLM-variance event that promoted P10 from hardening_recommended to production_blocker. The fix path then required four more iterations: durable steering primitives (iter12), step-aware recovery + terminal-status guards (iter13), and the post-validate abort re-check that closed P10 definitively (iter14).

The cert reviewer's iter10 downgrade caught a real bug — but in the submission discipline, not the underlying code. The excerpt sent to validate was missing `from app.db.session import Base`. The actual repo file always had it. The cert reviewer's verdict prose said so explicitly. Iter11 restored the import; the regression that immediately followed was the variance event, not a code change.

The doctrine the validator applies is the same doctrine that runs inside Claude Code via MCP. Every recommendation in every iteration is mechanism-specific (named identifiers, named files, named race conditions). None are template-generic. The iter14 cert summary names which terminal states and retry paths confirm the verdict.

Verbatim from iter2 — the P7 finding the iter1 review hadn't reached

DRAFT

The audit envelope is a substantial improvement: `audit_object` records `schema_version`, `captured_at`, source `repo_url`/`commit_sha`, selected file hashes, skipped reads, `bundle_sha256`, validator latency, and shape-only log signals. However, the bridge still constructs the validator payload by concatenating raw untrusted repo content in `_build_implementation_context()` via `chunks.append(f"# === FILE: {path} ===\n{content}\n")` and then sends it as `ValidationRequest(code=bundle, ...)`. This file does not establish or record an inert untrusted-input boundary, so repo comments or filenames containing instruction-like text could steer the validator unless a hidden downstream service compensates.

architect.validate · iter2 verdict on P7 Inspectability · production_blocker · 86b2c59d

Verbatim from iter14 — P10 steering, finally aligned

CERT CONFIRMED

Steering primitives are now present for both validation and onboarding recovery. `request_abort()` persists `abort_requested=True`; `_execute_with_job()` checks for abort before validation and, in the Iter14 change, refreshes the job and re-checks `job.abort_requested` immediately after `asyncio.run(validate_code_against_principles(...))` returns and before creating `UserValidationRun` or calling `mark_completed()`, discarding validator output on abort. `retry_failed_validation_job()` requires the latest job to be terminal and `retry_eligible`, while `retry_onboarding_handoff()` preserves the failed state (`firebase_user_failed`, `sign_in_link_failed`, or `approval_email_failed`) and clears only `onboarding_failure_reason`, allowing `approve()` to re-enter step-aware recovery.

architect.validate · iter14 verdict on P10 Steering · aligned (sev 60 → 0) · 3ac16b20

What the bridge is now

An end-to-end orchestration surface that converts an arbitrary public GitHub repository into a validator-compatible bundle, with durable workflow state, typed file-envelope boundaries, step-aware operator recovery, post-validate abort honoring, and a cert-confirmed production_ready badge that any reader can verify by re-fetching the run_id. The first 10 cohort applicants this week will be onboarded through this code.

Three case studies, three facets

This is the self-validation facet of the recursive-integrity triad. The other two facets are public: the substrate-validation scan on Anthropic's own agent-SDK demo code, and the reference-honesty scan on our own A2A protocol example. Same doctrine, three code surfaces, three different verdicts.

Receipts

Every claim above is replayable from the public run URLs. The 10-principle doctrine fingerprint stayed constant across all fourteen iterations (validator confirmed `same_doctrine` on every baseline comparison). No mid-stream rule changes; the same rubric scored every run.

Re-validation result

After Iter14: architect.certify confirmed production_ready

The iter14 implementation was validated and then certified in the same prod-MCP session. Cert outcome: confirmed_production_ready. The badge is live and the readiness review is publicly inspectable.

Iter1 (before)

Draft · 35/F

Initial submission · no closures

Iter14 (after)

Production-ready · Cert confirmed

100/A · 0 missed defects

Time to fix

Fourteen iterations

From 35/F draft to confirmed_production_ready

View the live readiness review →

Calculated ROI

Same metrics, same calculator powering every case study

Derived deterministically from this case study's profile (14 iterations, code-modifying blast radius, autonomous workflow, under compliance) via /lib/case-study-roi.ts. Numbers directly comparable to the other case studies.

Senior-architect time replaced

~274 hours @ $150/hour ≈ ~$41K per agent

Production ROI per agent / year

$80K – $200K (incident prevention + audit prep + rework)

Time to identify the governance gaps

2-4 weeks of senior-architect review WITHOUT Blueprint, ~70 min / 14 validator passes WITH Blueprint

Incidents prevented (range)

5-15 per year of unintended production source-code changes (each ~4-40 hours of incident-response / rollback)

Compliance audit prep

~80-120 hours / year replaced with one audit query

Related, Pro / Teams

Run this as a Blueprint Readiness Score

The Architect Agent is the same review pattern shown in this case study, applied to your code. Call architect.validate to get a Blueprint Readiness Score (0–100, A–F) per repository, and a regression diff between runs so the next review focuses on what changed.

Sample score card

B
82/ 100

Production-ready

▲ 7

acme/customer-agent

Run your own validation

Paste your agent code or describe your workflow. The validator returns principle-by-principle findings, a readiness score, and a shareable review URL in seconds. Reach 80+/A and cert mints a public badge that matches the one above.

Other case studies