Assessment complete; awaiting evidence revision.
Evaluated 23 May 2026 against the AI Design Blueprint doctrine
Production-ready
Status: Aligned
98/100
Grade A
production_ready means trust boundaries hold. The hardening recommendations below are iteration material, not a deficit: that's what production_ready means under the doctrine.
Blueprint Readiness measures doctrine alignment, not runtime correctness. A production-ready verdict means the architecture embodies the 10 principles; it does not run your tests or types. Layer it on top of your test suite, not in place of it.
The submitted code is an autonomous background validation workflow and the SEP-1686 task adoption materially closes the durable lifecycle gap: `task_id`, `task_state`, history recovery, projected UI status, dedupe signalling, and cancellation are now represented as durable product primitives rather than chat-scrollback state. The main remaining gap is inspectability of how the validation result was produced: the snippets show lifecycle/state persistence, but not a compact provenance or event ledger for model/tool/policy decisions behind `_execute_validate_body()`.
What to expect from certification
This run is eligible for the certified production_ready badge. Certification is an adversarial second-pass review, independent of the first pass. It's the extra layer of proof that separates a "scored production_ready" run from a certified one.
Three possible outcomes:
A downgrade is by design, not a defect. The cert reviewer is an adversarial, independent, deliberately stricter second pass. When it downgrades, it's doing its job: catching what the first pass missed. That's the additional layer that makes production_ready a guarantee rather than an estimate.
To certify this run: call architect.certify(run_id, code) via MCP, or from the app on a Pro/Teams plan. Three attempts per run; each attempt is one LLM call (typical 60-180 seconds at high reasoning effort; server-side budget 20 minutes).
Per-principle findings
10 principles evaluated. Verdict, severity, evidence and recommendation for each.
P7
needs changeshardening recommended35/100Establish trust through inspectability
The submission improves lifecycle inspectability with durable `UserValidationRun` rows, `task_id`, `task_state`, `structuredContent=task_result_dict`, and `me.validation_history(run_id)`. However, the snippets do not show a provenance/event ledger for how `_execute_validate_body()` produced the assessment: there is no visible persisted trace of model/policy version, source bundle identity beyond `code_fingerprint`, tool calls, decision checkpoints, or result-generation events. The current primitive makes the task lifecycle inspectable, but not yet the production path of the validation judgement itself.
Recommendation
Persist a compact run provenance ledger keyed by `run_id` outside the execution loop, recording the input/code fingerprint, policy/model version, relevant tool calls, decision checkpoints, and final result hash; link that ledger from `validation_history` while keeping the dashboard projection separate from raw internals.
P1
alignedDesign for delegation rather than direct manipulation
Delegation is represented explicitly: the handler accepts `params.task: {ttl}`, mints a `run_id`/`task_id` at start, returns `CreateTaskResult` immediately, and performs validation in background through `experimental.run_task`. The REST `cancel_task_on_validation_run` endpoint and `can_cancel` projection give the user a termination control instead of forcing them to wait on a blocking call.
P2
alignedEnsure that background work remains perceptible
The previous ephemeral lifecycle is replaced with durable, perceptible primitives: `UserValidationRun.task_id` is a top-level recovery handle, `task_state` stores lifecycle JSON, `project_task_state()` projects it into dashboard-facing status, and `me.validation_history(run_id)` is described as the recovery path. The UI also renders `task_status` across history, projects, dashboard recent activity, and detail surfaces.
P?
alignedAlign feedback with the user’s level of attention
Feedback is proportionate: the primary UI consumes the compact `TaskStateProjection` with `task_status`, `status_message`, `terminal_reason`, and `can_cancel`, while routine background work is shown through badges/pills rather than full protocol internals. Terminal and exceptional states such as `failed`, `cancelled`, `task_state_parse_error`, and no-result cards escalate only when user attention is needed.
P4
alignedApply progressive disclosure to system agency
The code applies progressive disclosure by separating the raw task protocol state from the dashboard surface. `project_task_state()` strips `schema_version`, `idempotency_key`, `progress_token`, and `ttl_ms`; normal views show user-level lifecycle state, while deeper recovery/inspection happens through `me.validation_history(run_id)` and task result retrieval.
P5
alignedReplace implied magic with clear mental models
The workflow replaces implied magic with named states and documented limits: `_PROJECTION_STATUS_MAP` maps protocol states like `working` and `input_required` into `active` and `awaiting_action`; cancelled/failed-without-assessment runs get explicit no-result handling; and the asymmetry that only `architect.validate` is task-augmented while `validate_consensus` and `certify` remain sync-only is disclosed on rendered surfaces.
P6
alignedExpose meaningful operational state, not internal complexity
The user-facing state is meaningful rather than internal. `TaskStateProjection` exposes `task_status`, `status_message`, timestamps, `terminal_reason`, and `can_cancel`, while intentionally hiding protocol fields such as `idempotency_key`, `progress_token`, and `ttl_ms`. Frontend reads consume the projection only, avoiding client-side parsing of raw `task_state`.
P8
alignedMake hand-offs, approvals, and blockers explicit
Hand-offs and blockers are explicit. Idempotency dedupe returns an existing `CreateTaskResult` and annotates `_meta.dedupe_hit` plus `_meta.dedupe_link` rather than silently swallowing duplicate work. The task worker raises on structured error envelopes so the SDK can mark the task failed instead of completed, and cancellation returns clear 400/404/409 cases in `cancel_task_on_validation_run`.
P9
alignedRepresent delegated work as a system, not merely as a conversation
Delegated validation work is represented as a system: `PgValidationTaskStore` backs tasks with database rows, `task_id` is unique-indexed, `task_state` persists lifecycle, tasks are queryable/listable/cancellable through task methods, and dashboard surfaces render lifecycle status independently of conversation text. This is a durable task model rather than an unstructured transcript.
P10
alignedOptimise for steering, not only initiating
The code supports mid-run steering through cancellation. `cancel_task_on_validation_run()` exposes a user-facing cancel endpoint; `PgValidationTaskStore.update_task(..., status='cancelled')` writes terminal state and sets `row.cancel_requested = True`; terminal-to-nonterminal transitions are rejected with `is_terminal()`, and the UI uses `can_cancel` to show the control only while actionable. The documentation also honestly frames that in-flight provider calls may not be abortable, avoiding a false mental model of magical cancellation.
Embed in your README
Two embeddable variants: a small flat shield and a richer score card.
Score card (recommended)
[](https://aidesignblueprint.com/en/readiness-review/2876f354-e141-41f3-9582-b413567b0f77)
Flat badge
[](https://aidesignblueprint.com/en/readiness-review/2876f354-e141-41f3-9582-b413567b0f77)
Run ID: 2876f354-e141-41f3-9582-b413567b0f77 · Results expire after 90 days
Run by agents. Governed by humans. Validated by the AI Design Blueprint.