Valutazione completata; in attesa di revisione delle prove.
Valutato il 23 maggio 2026 rispetto alla doctrine di AI Design Blueprint
Pronto per produzione
Stato: Allineato
98/100
Voto A
production_ready significa che i confini di fiducia tengono. Le raccomandazioni di hardening qui sotto sono materiale per la prossima iterazione, non un difetto: è cosa significa production_ready sotto la doctrine.
Blueprint Readiness misura l'allineamento alla doctrine, non la runtime correctness. Un verdetto production-ready significa che l'architettura incarna i 10 principi; non esegue i tuoi test o i tuoi tipi. Affiancala alla tua test suite, non sostituirla.
The submitted code is an autonomous background validation workflow and the SEP-1686 task adoption materially closes the durable lifecycle gap: `task_id`, `task_state`, history recovery, projected UI status, dedupe signalling, and cancellation are now represented as durable product primitives rather than chat-scrollback state. The main remaining gap is inspectability of how the validation result was produced: the snippets show lifecycle/state persistence, but not a compact provenance or event ledger for model/tool/policy decisions behind `_execute_validate_body()`.
Cosa aspettarsi dalla certificazione
Questo run è eleggibile per la certificazione production_ready. La certificazione è una seconda revisione avversaria, indipendente dalla prima. È il livello di prova in più che separa una valutazione "production_ready" da una certificata.
Tre esiti possibili:
Un downgrade è di proposito, non un difetto. Il cert agent è un revisore avversario, indipendente, deliberatamente più severo del first-pass. Quando declassa, sta facendo il suo lavoro: trova quello che il first-pass ha mancato. È il livello in più che rende production_ready una garanzia, non una stima.
Per certificare questo run: chiama architect.certify(run_id, code) via MCP, oppure dall'app come team Pro/Teams. Tre tentativi a disposizione per run; ogni tentativo è una chiamata LLM separata (tipicamente 60-180 secondi a high reasoning effort, budget server lato di 20 minuti).
Findings per principio
10 principi valutati. Verdict, severity, evidenza e raccomandazione per ognuno.
P7
Richiede modificheHardening consigliato35/100Establish trust through inspectability
The submission improves lifecycle inspectability with durable `UserValidationRun` rows, `task_id`, `task_state`, `structuredContent=task_result_dict`, and `me.validation_history(run_id)`. However, the snippets do not show a provenance/event ledger for how `_execute_validate_body()` produced the assessment: there is no visible persisted trace of model/policy version, source bundle identity beyond `code_fingerprint`, tool calls, decision checkpoints, or result-generation events. The current primitive makes the task lifecycle inspectable, but not yet the production path of the validation judgement itself.
Raccomandazione
Persist a compact run provenance ledger keyed by `run_id` outside the execution loop, recording the input/code fingerprint, policy/model version, relevant tool calls, decision checkpoints, and final result hash; link that ledger from `validation_history` while keeping the dashboard projection separate from raw internals.
P1
AllineatoDesign for delegation rather than direct manipulation
Delegation is represented explicitly: the handler accepts `params.task: {ttl}`, mints a `run_id`/`task_id` at start, returns `CreateTaskResult` immediately, and performs validation in background through `experimental.run_task`. The REST `cancel_task_on_validation_run` endpoint and `can_cancel` projection give the user a termination control instead of forcing them to wait on a blocking call.
P2
AllineatoEnsure that background work remains perceptible
The previous ephemeral lifecycle is replaced with durable, perceptible primitives: `UserValidationRun.task_id` is a top-level recovery handle, `task_state` stores lifecycle JSON, `project_task_state()` projects it into dashboard-facing status, and `me.validation_history(run_id)` is described as the recovery path. The UI also renders `task_status` across history, projects, dashboard recent activity, and detail surfaces.
P?
AllineatoAlign feedback with the user’s level of attention
Feedback is proportionate: the primary UI consumes the compact `TaskStateProjection` with `task_status`, `status_message`, `terminal_reason`, and `can_cancel`, while routine background work is shown through badges/pills rather than full protocol internals. Terminal and exceptional states such as `failed`, `cancelled`, `task_state_parse_error`, and no-result cards escalate only when user attention is needed.
P4
AllineatoApply progressive disclosure to system agency
The code applies progressive disclosure by separating the raw task protocol state from the dashboard surface. `project_task_state()` strips `schema_version`, `idempotency_key`, `progress_token`, and `ttl_ms`; normal views show user-level lifecycle state, while deeper recovery/inspection happens through `me.validation_history(run_id)` and task result retrieval.
P5
AllineatoReplace implied magic with clear mental models
The workflow replaces implied magic with named states and documented limits: `_PROJECTION_STATUS_MAP` maps protocol states like `working` and `input_required` into `active` and `awaiting_action`; cancelled/failed-without-assessment runs get explicit no-result handling; and the asymmetry that only `architect.validate` is task-augmented while `validate_consensus` and `certify` remain sync-only is disclosed on rendered surfaces.
P6
AllineatoExpose meaningful operational state, not internal complexity
The user-facing state is meaningful rather than internal. `TaskStateProjection` exposes `task_status`, `status_message`, timestamps, `terminal_reason`, and `can_cancel`, while intentionally hiding protocol fields such as `idempotency_key`, `progress_token`, and `ttl_ms`. Frontend reads consume the projection only, avoiding client-side parsing of raw `task_state`.
P8
AllineatoMake hand-offs, approvals, and blockers explicit
Hand-offs and blockers are explicit. Idempotency dedupe returns an existing `CreateTaskResult` and annotates `_meta.dedupe_hit` plus `_meta.dedupe_link` rather than silently swallowing duplicate work. The task worker raises on structured error envelopes so the SDK can mark the task failed instead of completed, and cancellation returns clear 400/404/409 cases in `cancel_task_on_validation_run`.
P9
AllineatoRepresent delegated work as a system, not merely as a conversation
Delegated validation work is represented as a system: `PgValidationTaskStore` backs tasks with database rows, `task_id` is unique-indexed, `task_state` persists lifecycle, tasks are queryable/listable/cancellable through task methods, and dashboard surfaces render lifecycle status independently of conversation text. This is a durable task model rather than an unstructured transcript.
P10
AllineatoOptimise for steering, not only initiating
The code supports mid-run steering through cancellation. `cancel_task_on_validation_run()` exposes a user-facing cancel endpoint; `PgValidationTaskStore.update_task(..., status='cancelled')` writes terminal state and sets `row.cancel_requested = True`; terminal-to-nonterminal transitions are rejected with `is_terminal()`, and the UI uses `can_cancel` to show the control only while actionable. The documentation also honestly frames that in-flight provider calls may not be abortable, avoiding a false mental model of magical cancellation.
Aggiungi al tuo README
Due varianti embeddabili: una piccola e una a card più ricca.
Score card (consigliata)
[](https://aidesignblueprint.com/en/readiness-review/2876f354-e141-41f3-9582-b413567b0f77)
Badge piatto
[](https://aidesignblueprint.com/en/readiness-review/2876f354-e141-41f3-9582-b413567b0f77)
Run ID: 2876f354-e141-41f3-9582-b413567b0f77 · Results expire after 90 days
Run by agents. Governed by humans. Validated by the AI Design Blueprint.