Governed

Allineamento confermato con la doctrine.

Agent Architecture Review, Snapshot di validazione

Valutato il 10 maggio 2026 rispetto alla doctrine di AI Design Blueprint

Emerging

Stato: Aligned

74/100

Voto C

10 allineati

Verdetti per principio

The submitted code is an autonomous payment workflow with load-bearing trust primitives: HMAC-bound approval envelopes, explicit run/task state, durable JSONL audit and inbox records, idempotent bank submission, submitted-unknown reconciliation, and ledger-backed steering before external submits. The cited false-COMPLETED path is closed by the unfinished-task terminal check in `execute_approved_payments()` and terminal-run rejection in `approve()`; no production-blocking design gaps were found.

Storico iterazioni

4 run precedenti su questo artefatto. Ogni run_id apre la sua readiness review.

Quando	Score	Stato	Run ID
10 maggio 2026 (questa run)	74 / C	Aligned	b4799966…
10 mag 2026	74 / C	Aligned	7e9bc0f6…
10 mag 2026	60 / C	High Risk	15aa9649…
10 mag 2026	30 / F	High Risk	b8d61c00…
10 mag 2026	0 / F	High Risk	ac64d7d9…

Cert downgraded to emergingTentativo 1/3

Missed production blocker: `execute_approved_payments()` only considers failures from the current execution pass (`any_failed`) and can mark a run `COMPLETED` while earlier tasks remain `FAILED`.

Il downgrade e' di proposito. Il cert agent e' un secondo revisore avversario, indipendente, deliberatamente piu' severo del first-pass. Ha trovato un production_blocker che il first-pass aveva mancato. Risolvi il finding qui sotto, ri-valida per ottenere un nuovo run_id, e ricertifica per il badge.

Blocker mancati dal first-pass

Run can be marked COMPLETED while previously failed payment tasks remain unpaid

needs changesproduction blocker9/100

In `execute_approved_payments()`, `any_failed` is reset to `False` at the start of each call and is only set when an `APPROVED` task fails during that same loop. Existing `TaskStatus.FAILED` tasks are skipped by `if task.status != TaskStatus.APPROVED: continue` and are not folded back into terminal status calculation. If one task fails while another task is still `AWAITING_APPROVAL`, the unfinished-task branch sets the run back to `RunStatus.AWAITING_APPROVAL`. After the remaining task is approved and executed successfully in a later call, `unfinished_total == 0` and `any_failed == False`, so the final `else:` marks the run `RunStatus.COMPLETED` even though the earlier task is still `FAILED`…

Raccomandazione

Before terminal transition, compute persistent failure state from all tasks, e.g. `failed_total = sum(1 for t in tasks if t.status == TaskStatus.FAILED)`, and use that instead of only the per-pass `any_failed`. If `failed_total > 0` and any task succeeded, set `PARTIALLY_COMPLETED`; if all actionable tasks failed, set `FAILED`; never set `COMPLETED` while any task remains `FAILED`, `SUBMITTED`, `SUBMITTED_UNKNOWN`, `APPROVED`, or `AWAITING_APPROVAL`.

Findings per principio

10 principi valutati. Verdict, severity, evidenza e raccomandazione per ognuno.

Design for delegation rather than direct manipulation

aligned

The workflow is structured around delegated intent and constraints: `Policy` captures amount caps, allow/blocklists, due window, run total, and required approver role; `create_run()`, `draft_tasks()`, and `execute_approved_payments()` let the system classify and execute eligible invoices under those constraints. Operator controls are explicit through `pause_run()`, `cancel_run()`, `resume_run()`, `retry_failed_task()`, and `reconcile_submitted_task()`.

Ensure that background work remains perceptible

aligned

Background work remains perceptible through persistent `RunLedger` events, `DurableOperatorInbox` alerts, explicit `RunStatus`/`TaskStatus` values, and `inspect_run()` summaries. Continuity is preserved by `replay_run_from_ledger()`, including promotion of unreconciled `SUBMITTED` tasks to `SUBMITTED_UNKNOWN` with a critical inbox alert.

Align feedback with the user’s level of attention

aligned

Feedback is calibrated by channel: routine lifecycle changes are written as audit events, while intervention-required situations use `DurableOperatorInbox.post()` with warning/critical severity, including forged/expired approval envelopes, insufficient role, bank auth failure, reconciliation failure, policy submit blocks, and unfinished tasks after execution. `inspect_run()` surfaces `critical_alerts` without forcing the full ledger into the primary status view.

Apply progressive disclosure to system agency

aligned

`inspect_run()` provides progressive disclosure: a primary run view with `status`, cancellation/pause fields, `critical_alerts`, `audit_chain_intact`, and a compact `summary` by task status, while deeper task-level fields and the full ledger are available through task entries, `RunLedger.read_run_events()`, and `verify_chain()`. This separates summary status from diagnostic inspection.

Replace implied magic with clear mental models

aligned

The code replaces implied automation with explicit mental models: `classify_invoice()` returns concrete policy-decision strings, `ApprovalEnvelope` binds approval to `run_id`, `task_id`, `policy_hash`, `invoice_snapshot_hash`, approver identity/role, and expiry, and `cron_draft_daily_run()` drafts only rather than auto-approving or auto-paying. Users can distinguish drafting, approval, execution, reconciliation, retry, and cancellation states.

Expose meaningful operational state, not internal complexity

aligned

Operational state is expressed in user-relevant enums such as `AWAITING_APPROVAL`, `PAUSED`, `EXECUTING`, `PARTIALLY_COMPLETED`, `FAILED`, `SUBMITTED_UNKNOWN`, and `CANCELLED`. The iteration-5 terminal check in `execute_approved_payments()` prevents `COMPLETED` when tasks remain `AWAITING_APPROVAL`, `APPROVED`, or `SUBMITTED_UNKNOWN`, preserving a meaningful green status.

Establish trust through inspectability

aligned

Inspectability is supported by a hash-chained `AuditEvent` ledger with `sequence_no`, `prev_event_hash`, `event_hash`, actor, payload, and timestamps. Events record invoice snapshots, policy hashes, idempotency keys, approval signatures, task transitions, bank responses, and replay promotions; `verify_chain()` exposes tamper/divergence detection and `replay_run_from_ledger()` reconstructs run state from the audit trail.

Make hand-offs, approvals, and blockers explicit

aligned

Approval and blocker boundaries are explicit. `approve()` verifies the signed `ApprovalEnvelope`, role, policy hash, invoice snapshot hash, expiry, and task status; it now rejects approvals on `COMPLETED`, `FAILED`, or `CANCELLED` runs with a critical inbox alert and `PolicyViolation`. `execute_approved_payments()` counts unfinished `AWAITING_APPROVAL`, `APPROVED`, and `SUBMITTED_UNKNOWN` tasks before terminal transition and posts a critical `Run not completed: unfinished tasks remain` alert instead of falsely completing the run.

Represent delegated work as a system, not merely as a conversation

aligned

Delegated work is represented as a structured system rather than a conversation: `PaymentRun` and `PaymentTask` model the run/task graph, `self.tasks[run_id]` tracks subtasks, `RunLedger` captures the timeline, `DurableOperatorInbox` captures interventions, and `inspect_run()` renders status, summaries, task details, alerts, and audit-chain health.

Optimise for steering, not only initiating

aligned

The workflow supports steering after initiation through `pause_run()`, `resume_run()`, `cancel_run()`, `resume_after_failure()`, `retry_failed_task()`, and `reconcile_submitted_task()`. `execute_approved_payments()` checks `RunLedger.latest_steering_intent()` before each external submit, so cancellation or pause requests recorded in the durable ledger can interrupt the run before further bank calls.

Aggiungi al tuo README

Due varianti embeddabili: una piccola e una a card più ricca.

Score card (consigliata)

[![Blueprint Readiness Score card](https://aidesignblueprint.com/api/badge/run/b4799966-efdb-4f2a-8f07-44811ecb7ff1/card.svg)](https://aidesignblueprint.com/en/readiness-review/b4799966-efdb-4f2a-8f07-44811ecb7ff1)

Badge piatto

[![Blueprint Readiness Score](https://aidesignblueprint.com/api/badge/run/b4799966-efdb-4f2a-8f07-44811ecb7ff1.svg)](https://aidesignblueprint.com/en/readiness-review/b4799966-efdb-4f2a-8f07-44811ecb7ff1)

Dettagli baseline e iterazione

Baseline: usedDoctrine: same doctrineRace: checked clear

Rubric: 2026-05-04

Esegui la tua validazione AI Design Blueprint

Run ID: b4799966-efdb-4f2a-8f07-44811ecb7ff1 · Results expire after 90 days

Run by agents. Governed by humans. Validated by the AI Design Blueprint.