Governed

Allineamento confermato con la doctrine.

Agent Architecture Review, Snapshot di validazione

Valutato il 10 maggio 2026 rispetto alla doctrine di AI Design Blueprint

Emerging

Stato: Aligned

74/100

Voto C

10 allineati

Verdetti per principio

The submission is an autonomous invoice-payment workflow with explicit run/task lifecycle, durable audit and alert surfaces, signed approval gates, idempotent bank submission, reconciliation for uncertain external side effects, and steering controls. The iteration materially addresses the prior high-risk gaps around approval authority, submitted-unknown recovery, and mid-run interruption.

Storico iterazioni

3 run precedenti su questo artefatto. Ogni run_id apre la sua readiness review.

Quando	Score	Stato	Run ID
10 maggio 2026 (questa run)	74 / C	Aligned	7e9bc0f6…
10 mag 2026	60 / C	High Risk	15aa9649…
10 mag 2026	30 / F	High Risk	b8d61c00…
10 mag 2026	0 / F	High Risk	ac64d7d9…

Cert downgraded to emergingTentativo 1/3

Downgrade: `execute_approved_payments()` can mark a run `COMPLETED` while payment tasks are still `AWAITING_APPROVAL`, leaving invoices unpaid and later unexecutable without any blocker alert.

Il downgrade e' di proposito. Il cert agent e' un secondo revisore avversario, indipendente, deliberatamente piu' severo del first-pass. Ha trovato un production_blocker che il first-pass aveva mancato. Risolvi il finding qui sotto, ri-valida per ottenere un nuovo run_id, e ricertifica per il badge.

Blocker mancati dal first-pass

Execution can complete a run while approvals are still pending

needs changesproduction blocker88/100

In `execute_approved_payments()`, the loop silently skips every task whose status is not `TaskStatus.APPROVED` via `if task.status != TaskStatus.APPROVED: continue`. After that, if there were no failures, cancellation, or pause, the final status logic unconditionally sets `run.status = RunStatus.COMPLETED`. A normal path is: `draft_tasks()` creates eligible tasks as `TaskStatus.AWAITING_APPROVAL` and sets the run to `RunStatus.AWAITING_APPROVAL`; calling `execute_approved_payments()` before all tasks are approved skips those pending tasks and records the run as completed. `approve()` also does not reject approvals on a completed run, so a task can later become `APPROVED`, but `execute_approv…

Raccomandazione

Before setting a run to `COMPLETED`, require that no actionable tasks remain in `AWAITING_APPROVAL` or `APPROVED`; if approvals are still pending, keep the run in `AWAITING_APPROVAL` or a clear blocked/partially-executed state and surface the required approval action. Also prevent `approve()` from approving tasks on terminal runs unless the run is explicitly reopened.

Findings per principio

10 principi valutati. Verdict, severity, evidenza e raccomandazione per ognuno.

Design for delegation rather than direct manipulation

aligned

`PaymentWorkflow.create_run()` accepts an operator-owned `Policy`, `draft_tasks()` classifies invoices against explicit constraints, and `execute_approved_payments()` only submits tasks that moved through `AWAITING_APPROVAL` to `APPROVED`. The workflow exposes delegation controls through `pause_run()`, `resume_run()`, `cancel_run()`, `retry_failed_task()`, and `resume_after_failure()` rather than requiring manual execution of every invoice step.

Ensure that background work remains perceptible

aligned

Background continuity is represented through durable JSONL primitives: `RunLedger` records `run.created`, task status events, `run.executing`, and terminal run events; `DurableOperatorInbox` persists intervention alerts; `inspect_run()` exposes current status, task counts, critical alerts, and `needs_reconciliation`. The prior P2 gap is improved by `TaskStatus.SUBMITTED_UNKNOWN`, `replay_run_from_ledger()`, and `reconcile_submitted_task()`, which make crash-after-submit states visible instead of silently losing continuity.

Align feedback with the user’s level of attention

aligned

The code separates routine status from attention-demanding conditions: normal progress is summarized by `inspect_run()['summary']`, while material intervention paths post alerts via `DurableOperatorInbox.post()` for forged approvals, expired envelopes, insufficient roles, policy violations, bank submission failures, and `SUBMITTED_UNKNOWN` reconciliation. Severity is calibrated with values such as `critical` for `auth_missing` and reconciliation blockers, and `warning` for non-fatal bank submission failures.

Apply progressive disclosure to system agency

aligned

`inspect_run()` provides a layered view: top-level run status, owner, policy hash, pause/cancel flags, audit-chain health, critical alerts, and aggregate task counts are available first, while per-task details such as invoice snapshot hash, approver identity, transfer id, failure class, and idempotency key are available in the nested `tasks` list. Deeper forensic detail remains in `RunLedger.read_run_events()` and `verify_chain()` rather than being forced into the default summary.

Replace implied magic with clear mental models

aligned

The system’s mental model is explicit: `Policy` names amount caps, allow/block lists, due window, and required approver role; `classify_invoice()` returns concrete policy decisions such as `vendor_blocked` or `outside_due_window`; and `ApprovalEnvelope` binds approval to `run_id`, `task_id`, `policy_hash`, and `invoice_snapshot_hash`. The prior P5 issue is improved because `approve()` no longer trusts raw approver strings; it verifies an HMAC envelope before recording approval authority.

Expose meaningful operational state, not internal complexity

aligned

`RunStatus` and `TaskStatus` use user-relevant operational states such as `AWAITING_APPROVAL`, `PAUSED`, `EXECUTING`, `PARTIALLY_COMPLETED`, `FAILED`, `SUBMITTED_UNKNOWN`, `SUCCEEDED`, and `CANCELLED`. `inspect_run()` translates these into actionable summaries, including counts by task status, submitted amount, critical alerts, and reconciliation count, rather than exposing only low-level ledger mechanics.

Establish trust through inspectability

aligned

The inspectability primitive is load-bearing: `RunLedger.append()` creates an append-only hash chain using `prev_event_hash` and `event_hash`, while `verify_chain()` detects sequence or hash divergence. Task events persist invoice snapshots, `invoice_snapshot_hash`, policy hash, idempotency key, approval signature, bank responses, and failure classes, allowing reviewers to reconstruct how a payment decision and submission occurred.

Make hand-offs, approvals, and blockers explicit

aligned

Approval and blocker boundaries are explicit. `approve()` verifies `ApprovalEnvelope` with `hmac.compare_digest()` before trusting role or identity, checks expiry, policy hash, invoice snapshot hash, and required role, and records approved tasks only after those gates pass. Execution cannot submit unless a task is `APPROVED`; bank auth failures, policy violations, invalid signatures, insufficient roles, and reconciliation blockers are surfaced through ledger events and/or durable inbox alerts. The prior high-risk P8 approval-forgery gap is addressed by the signed envelope primitive.

Represent delegated work as a system, not merely as a conversation

aligned

Delegated work is represented as a structured system: `PaymentRun` owns run-level state and policy hash, `PaymentTask` owns per-invoice lifecycle and idempotency key, `RunLedger` records timeline/history, and `DurableOperatorInbox` records intervention-required alerts. Conversation or prompt text is not used as the execution state; the workflow is governed by typed records, enums, policy checks, and ledger events.

Optimise for steering, not only initiating

aligned

The workflow supports steering after initiation through `pause_run()`, `resume_run()`, `cancel_run()`, `retry_failed_task()`, `resume_after_failure()`, and `reconcile_submitted_task()`. `execute_approved_payments()` re-reads `RunLedger.latest_steering_intent()` before each external `transfer_funds()` call, so pause/cancel intents can interrupt a running executor before the next irreversible submit. The prior high-risk P10 gap is improved by durable steering events, same-key retry, and explicit `SUBMITTED_UNKNOWN` reconciliation.

Aggiungi al tuo README

Due varianti embeddabili: una piccola e una a card più ricca.

Score card (consigliata)

[![Blueprint Readiness Score card](https://aidesignblueprint.com/api/badge/run/7e9bc0f6-8fd3-4586-851f-d26b720ac767/card.svg)](https://aidesignblueprint.com/en/readiness-review/7e9bc0f6-8fd3-4586-851f-d26b720ac767)

Badge piatto

[![Blueprint Readiness Score](https://aidesignblueprint.com/api/badge/run/7e9bc0f6-8fd3-4586-851f-d26b720ac767.svg)](https://aidesignblueprint.com/en/readiness-review/7e9bc0f6-8fd3-4586-851f-d26b720ac767)

Dettagli baseline e iterazione

Baseline: usedDoctrine: same doctrineRace: checked clear

Delta iterazione

Miglioramenti (4)

P2Ensure that background work remains perceptibleneeds_changesaligned

P5Replace implied magic with clear mental modelsneeds_changesaligned

P8Make hand-offs, approvals, and blockers explicithigh_riskaligned

P10Optimise for steering, not only initiatinghigh_riskaligned

Rubric: 2026-05-04

Esegui la tua validazione AI Design Blueprint

Run ID: 7e9bc0f6-8fd3-4586-851f-d26b720ac767 · Results expire after 90 days

Run by agents. Governed by humans. Validated by the AI Design Blueprint.