Vai al contenuto principaleVai al footer
Governed

Allineamento confermato con la doctrine.

Agent Architecture Review, Snapshot di validazione

Valutato il 10 maggio 2026 rispetto alla doctrine di AI Design Blueprint

Emerging

Stato: Aligned

74/100

Voto C

10 allineati
Verdetti per principio

The submission is an autonomous invoice-payment workflow with explicit run/task lifecycle, durable audit and alert surfaces, signed approval gates, idempotent bank submission, reconciliation for uncertain external side effects, and steering controls. The iteration materially addresses the prior high-risk gaps around approval authority, submitted-unknown recovery, and mid-run interruption.

Storico iterazioni

3 run precedenti su questo artefatto. Ogni run_id apre la sua readiness review.

QuandoScoreStatoRun ID
10 maggio 2026 (questa run)74 / CAligned7e9bc0f6
10 mag 202660 / CHigh Risk15aa9649
10 mag 202630 / FHigh Riskb8d61c00
10 mag 20260 / FHigh Riskac64d7d9
Cert downgraded to emergingTentativo 1/3

Downgrade: `execute_approved_payments()` can mark a run `COMPLETED` while payment tasks are still `AWAITING_APPROVAL`, leaving invoices unpaid and later unexecutable without any blocker alert.

Il downgrade e' di proposito. Il cert agent e' un secondo revisore avversario, indipendente, deliberatamente piu' severo del first-pass. Ha trovato un production_blocker che il first-pass aveva mancato. Risolvi il finding qui sotto, ri-valida per ottenere un nuovo run_id, e ricertifica per il badge.

Blocker mancati dal first-pass

P8

Execution can complete a run while approvals are still pending

needs changesproduction blocker88/100

In `execute_approved_payments()`, the loop silently skips every task whose status is not `TaskStatus.APPROVED` via `if task.status != TaskStatus.APPROVED: continue`. After that, if there were no failures, cancellation, or pause, the final status logic unconditionally sets `run.status = RunStatus.COMPLETED`. A normal path is: `draft_tasks()` creates eligible tasks as `TaskStatus.AWAITING_APPROVAL` and sets the run to `RunStatus.AWAITING_APPROVAL`; calling `execute_approved_payments()` before all tasks are approved skips those pending tasks and records the run as completed. `approve()` also does not reject approvals on a completed run, so a task can later become `APPROVED`, but `execute_approv…

Raccomandazione

Before setting a run to `COMPLETED`, require that no actionable tasks remain in `AWAITING_APPROVAL` or `APPROVED`; if approvals are still pending, keep the run in `AWAITING_APPROVAL` or a clear blocked/partially-executed state and surface the required approval action. Also prevent `approve()` from approving tasks on terminal runs unless the run is explicitly reopened.

Findings per principio

10 principi valutati. Verdict, severity, evidenza e raccomandazione per ognuno.

P0

Design for delegation rather than direct manipulation

aligned

`PaymentWorkflow.create_run()` accepts an operator-owned `Policy`, `draft_tasks()` classifies invoices against explicit constraints, and `execute_approved_payments()` only submits tasks that moved through `AWAITING_APPROVAL` to `APPROVED`. The workflow exposes delegation controls through `pause_run()`, `resume_run()`, `cancel_run()`, `retry_failed_task()`, and `resume_after_failure()` rather than requiring manual execution of every invoice step.

P0

Ensure that background work remains perceptible

aligned

Background continuity is represented through durable JSONL primitives: `RunLedger` records `run.created`, task status events, `run.executing`, and terminal run events; `DurableOperatorInbox` persists intervention alerts; `inspect_run()` exposes current status, task counts, critical alerts, and `needs_reconciliation`. The prior P2 gap is improved by `TaskStatus.SUBMITTED_UNKNOWN`, `replay_run_from_ledger()`, and `reconcile_submitted_task()`, which make crash-after-submit states visible instead of silently losing continuity.

P0

Align feedback with the user’s level of attention

aligned

The code separates routine status from attention-demanding conditions: normal progress is summarized by `inspect_run()['summary']`, while material intervention paths post alerts via `DurableOperatorInbox.post()` for forged approvals, expired envelopes, insufficient roles, policy violations, bank submission failures, and `SUBMITTED_UNKNOWN` reconciliation. Severity is calibrated with values such as `critical` for `auth_missing` and reconciliation blockers, and `warning` for non-fatal bank submission failures.

P0

Apply progressive disclosure to system agency

aligned

`inspect_run()` provides a layered view: top-level run status, owner, policy hash, pause/cancel flags, audit-chain health, critical alerts, and aggregate task counts are available first, while per-task details such as invoice snapshot hash, approver identity, transfer id, failure class, and idempotency key are available in the nested `tasks` list. Deeper forensic detail remains in `RunLedger.read_run_events()` and `verify_chain()` rather than being forced into the default summary.

P0

Replace implied magic with clear mental models

aligned

The system’s mental model is explicit: `Policy` names amount caps, allow/block lists, due window, and required approver role; `classify_invoice()` returns concrete policy decisions such as `vendor_blocked` or `outside_due_window`; and `ApprovalEnvelope` binds approval to `run_id`, `task_id`, `policy_hash`, and `invoice_snapshot_hash`. The prior P5 issue is improved because `approve()` no longer trusts raw approver strings; it verifies an HMAC envelope before recording approval authority.

P0

Expose meaningful operational state, not internal complexity

aligned

`RunStatus` and `TaskStatus` use user-relevant operational states such as `AWAITING_APPROVAL`, `PAUSED`, `EXECUTING`, `PARTIALLY_COMPLETED`, `FAILED`, `SUBMITTED_UNKNOWN`, `SUCCEEDED`, and `CANCELLED`. `inspect_run()` translates these into actionable summaries, including counts by task status, submitted amount, critical alerts, and reconciliation count, rather than exposing only low-level ledger mechanics.

P0

Establish trust through inspectability

aligned

The inspectability primitive is load-bearing: `RunLedger.append()` creates an append-only hash chain using `prev_event_hash` and `event_hash`, while `verify_chain()` detects sequence or hash divergence. Task events persist invoice snapshots, `invoice_snapshot_hash`, policy hash, idempotency key, approval signature, bank responses, and failure classes, allowing reviewers to reconstruct how a payment decision and submission occurred.

P0

Make hand-offs, approvals, and blockers explicit

aligned

Approval and blocker boundaries are explicit. `approve()` verifies `ApprovalEnvelope` with `hmac.compare_digest()` before trusting role or identity, checks expiry, policy hash, invoice snapshot hash, and required role, and records approved tasks only after those gates pass. Execution cannot submit unless a task is `APPROVED`; bank auth failures, policy violations, invalid signatures, insufficient roles, and reconciliation blockers are surfaced through ledger events and/or durable inbox alerts. The prior high-risk P8 approval-forgery gap is addressed by the signed envelope primitive.

P0

Represent delegated work as a system, not merely as a conversation

aligned

Delegated work is represented as a structured system: `PaymentRun` owns run-level state and policy hash, `PaymentTask` owns per-invoice lifecycle and idempotency key, `RunLedger` records timeline/history, and `DurableOperatorInbox` records intervention-required alerts. Conversation or prompt text is not used as the execution state; the workflow is governed by typed records, enums, policy checks, and ledger events.

P0

Optimise for steering, not only initiating

aligned

The workflow supports steering after initiation through `pause_run()`, `resume_run()`, `cancel_run()`, `retry_failed_task()`, `resume_after_failure()`, and `reconcile_submitted_task()`. `execute_approved_payments()` re-reads `RunLedger.latest_steering_intent()` before each external `transfer_funds()` call, so pause/cancel intents can interrupt a running executor before the next irreversible submit. The prior high-risk P10 gap is improved by durable steering events, same-key retry, and explicit `SUBMITTED_UNKNOWN` reconciliation.

Aggiungi al tuo README

Due varianti embeddabili: una piccola e una a card più ricca.

Score card (consigliata)

Blueprint Readiness Score card
[![Blueprint Readiness Score card](https://aidesignblueprint.com/api/badge/run/7e9bc0f6-8fd3-4586-851f-d26b720ac767/card.svg)](https://aidesignblueprint.com/en/readiness-review/7e9bc0f6-8fd3-4586-851f-d26b720ac767)

Badge piatto

Blueprint Readiness Score badge
[![Blueprint Readiness Score](https://aidesignblueprint.com/api/badge/run/7e9bc0f6-8fd3-4586-851f-d26b720ac767.svg)](https://aidesignblueprint.com/en/readiness-review/7e9bc0f6-8fd3-4586-851f-d26b720ac767)
Dettagli baseline e iterazione
Baseline: usedDoctrine: same doctrineRace: checked clear

Delta iterazione

Miglioramenti (4)

P2Ensure that background work remains perceptibleneeds_changesaligned
P5Replace implied magic with clear mental modelsneeds_changesaligned
P8Make hand-offs, approvals, and blockers explicithigh_riskaligned
P10Optimise for steering, not only initiatinghigh_riskaligned
Rubric: 2026-05-04

Run ID: 7e9bc0f6-8fd3-4586-851f-d26b720ac767 · Results expire after 90 days

Run by agents. Governed by humans. Validated by the AI Design Blueprint.