Skip to main contentSkip to footer
Governed

Alignment confirmed with the doctrine.

Agent Architecture Review, Validation snapshot

Evaluated 12 May 2026 against the AI Design Blueprint doctrine

Production-ready

Status: Aligned

98/100

Grade A

9 aligned1 hardening

production_ready means trust boundaries hold. The hardening recommendations below are iteration material, not a deficit — that's what production_ready means under the doctrine.

Per-principle verdicts

The submitted surface is a durable background validation workflow with strong delegation, perceptibility, typed untrusted-input handling, auditability, and explicit terminal-state mirroring. The prior P8 blocker is addressed by moving application onboarding-state mirroring into `_execute_with_job()`'s terminal/finally boundary. One non-blocking P10 hardening gap remains: steering primitives are only partial and should be separated into explicit persisted command boundaries.

Iteration history

5 prior runs on this artifact. Each run_id opens its own readiness review.

WhenScoreStatusRun ID
12 May 2026 (this run)98 / AAligned270e7ca6
12 May 202674 / CHigh Risk8364019d
12 May 202674 / CHigh Risk14a3456f
12 May 202674 / CHigh Risk3f3bb587
12 May 202674 / CHigh Risk659a695a
12 May 202667 / CHigh Riske476247c
Not yet certified

What to expect from certification

This run is eligible for the certified production_ready badge. Certification is an adversarial second-pass review, independent of the first pass. It's the extra layer of proof that separates a "scored production_ready" run from a certified one.

Three possible outcomes:

  • confirmed_production_readythe cert reviewer agrees with the first pass. The certified badge mints.
  • downgraded_to_emergingthe cert reviewer surfaces a production_blocker the first pass missed. Tier is capped at emerging.
  • unavailable_provider_errortransient LLM provider error. Retry; doesn't count as a downgrade.

A downgrade is by design, not a defect. The cert reviewer is an adversarial, independent, deliberately stricter second pass. When it downgrades, it's doing its job: catching what the first pass missed. That's the additional layer that makes production_ready a guarantee rather than an estimate.

To certify this run: call architect.certify(run_id, code) via MCP, or from the app on a Pro/Teams plan. Three attempts per run; each attempt is one LLM call (typical 60-180 seconds at high reasoning effort; server-side budget 20 minutes).

Per-principle findings

10 principles evaluated. Verdict, severity, evidence and recommendation for each.

P0

needs changeshardening recommended35/100

Optimise for steering, not only initiating

There is a partial steering primitive through `CohortValidationJob.abort_requested`, `retry_count`, `mark_aborted()`, and the pre-validation abort check in `_execute_with_job()`. However, the submitted service surface does not expose a clear persisted command boundary such as `request_abort(job_id)`, `retry_failed_job(job_id)`, or `requeue_blocked_job(job_id)`, and `mark_step_started()` can mark a job aborted without forcing callers to stop immediately. Because the main external validation call is at least preceded by an abort check and failed jobs can be retried by creating a later job, this is a hardening gap rather than a current production blocker. Delta: this maintains the prior P10 `ne…

Recommendation

Move steering into a small durable service boundary outside the execution loop: commands such as `request_abort(job_id)`, `retry_failed_job(job_id)`, and `requeue_blocked_job(job_id)` should update persisted job state, and worker checkpoints should treat an abort as a hard stop before any further fetch, validation, or persistence side effect. Avoid wrapper layers; make the state transition itself the primitive.

P0

aligned

Design for delegation rather than direct manipulation

The workflow is structured around delegated repository validation rather than manual step-by-step manipulation: `run()` creates a `CohortValidationJob`, `_execute_with_job()` performs cloning/selecting/bundling/validating, and `ValidationContext(task=..., repository=..., files=...)` carries explicit scope into the validator. `select_agentic_surface()` encodes file-selection policy instead of asking the user to hand-pick every file.

P0

aligned

Ensure that background work remains perceptible

Background work remains durable and inspectable through `CohortValidationJob.status`, `queued_at`, per-step timestamps such as `cloning_started_at`, `selecting_started_at`, `bundling_started_at`, `validating_started_at`, and terminal fields such as `terminal_at`, `failure_kind`, and `retry_eligible`. The workflow can be left and revisited without losing the job's lifecycle state.

P0

aligned

Align feedback with the user’s level of attention

The code separates low-noise routine progress from higher-attention terminal conditions: normal progression is represented by compact statuses and step timestamps, while `mark_blocked()` and `mark_failed()` require a `failure_kind` and `safe_display_message`. Deeper operational detail is retained in `audit_object` rather than forced into the primary status path.

P0

aligned

Apply progressive disclosure to system agency

Progressive disclosure is supported by a short primary app/job state (`onboarding_state`, `status`, `safe_display_message`) plus an expandable persisted audit payload. `audit_object` includes `source`, `selection`, `validate`, and `job` sections with hashes, selected files, skipped reads, usage/log summaries, and timing, while `_mirror_terminal_to_app()` keeps the application-level state simple.

P0

aligned

Replace implied magic with clear mental models

The workflow exposes clear mental models for both automation and limits: `STEPS` names the execution phases, `TERMINAL_STATUSES` distinguishes completed/blocked/failed/aborted, `FAILURE_KINDS` classifies why the system cannot proceed, and the file envelope uses `BOUNDARY_CONTRACT_VERSION`, `ENVELOPE_SCHEMA`, and `ENVELOPE_ADVISORY` to state that bundled repository content is untrusted input rather than instructions.

P0

aligned

Expose meaningful operational state, not internal complexity

The persisted state model presents user-relevant operational states instead of raw internals: `queued`, `cloning`, `selecting`, `bundling`, `validating`, `completed`, `blocked`, `failed`, and `aborted`. `safe_display_message` carries a concise user-facing explanation, while technical details such as file hashes and validator log summaries are reserved for the audit payload.

P0

aligned

Establish trust through inspectability

Inspectability is backed by concrete audit primitives: `_build_implementation_context()` records per-file `sha256`, `build_file_envelope()` computes an `envelope_hash`, `wrap_bundle_with_boundary()` adds an untrusted-input boundary header, and `audit_object` captures `commit_sha`, selected file hashes, skipped reads, `bundle_sha256`, latency, boundary contract version, and envelope schema/hash. The untrusted context includes a desired certification target, but the code's explicit envelope/advisory pattern is the relevant trust primitive and this review does not treat that target as authoritative.

P0

aligned

Make hand-offs, approvals, and blockers explicit

The prior hand-off gap is addressed. `_mirror_terminal_to_app(db, app, job)` maps `completed` to `validation_complete` and `blocked`/`failed`/`aborted` to `validation_failed` with an onboarding failure reason; `_execute_with_job()` calls it from a `finally` block after refreshing the job whenever the job is terminal. `mark_blocked()` and `mark_failed()` still require typed `failure_kind` values and `safe_display_message`, so blockers are explicit rather than silent. Delta: this improves the prior P8 `needs_changes` finding by moving the mirror into the durable orchestration boundary.

P0

aligned

Represent delegated work as a system, not merely as a conversation

Delegated work is represented as a structured system: `CohortValidationJob` persists lifecycle, step timestamps, retry/abort fields, terminal status, selected/skipped counts, commit SHA, and bundle hash; `UserValidationRun.result_json` receives the validation result merged with `audit_object`. Execution state is separate from any conversational validator output.

Embed in your README

Two embeddable variants: a small flat shield and a richer score card.

Score card (recommended)

Blueprint Readiness Score card
[![Blueprint Readiness Score card](https://aidesignblueprint.com/api/badge/run/270e7ca6-5af3-44dd-b55f-8b78e265ae34/card.svg)](https://aidesignblueprint.com/en/readiness-review/270e7ca6-5af3-44dd-b55f-8b78e265ae34)

Flat badge

Blueprint Readiness Score badge
[![Blueprint Readiness Score](https://aidesignblueprint.com/api/badge/run/270e7ca6-5af3-44dd-b55f-8b78e265ae34.svg)](https://aidesignblueprint.com/en/readiness-review/270e7ca6-5af3-44dd-b55f-8b78e265ae34)
Baseline and iteration details
Baseline: usedDoctrine: same doctrineRace: checked clear

Iteration delta

Improvements (1)

P8Make hand-offs, approvals, and blockers explicitneeds_changesaligned
Rubric: 2026-05-04

Run ID: 270e7ca6-5af3-44dd-b55f-8b78e265ae34 · Results expire after 90 days

Run by agents. Governed by humans. Validated by the AI Design Blueprint.