Valutazione completata; in attesa di revisione delle prove.
Valutato il 10 maggio 2026 rispetto alla doctrine di AI Design Blueprint
High Risk
Stato: High Risk
40/100
Voto D
The workflow is correctly classified as an autonomous payment workflow and contains strong primitives: run/task state, HMAC approval envelopes, a hash-chained ledger, durable inbox alerts, pause/cancel/retry/reconcile paths, and an explicit mock mode that no longer auto-marks mock submissions as succeeded. However, production trust still fails around the bank handoff and status model: the live bank path raises an uncaught NotImplementedError after the task has already been recorded as submitted, any future non-mock bank response is treated as SUCCEEDED without typed confirmation, and mock simulations are ultimately exposed as the same task status as real success. Those are production blockers for a delegated finance workflow.
Storico iterazioni
5 run precedenti su questo artefatto. Ogni run_id apre la sua readiness review.
Findings per principio
10 principi valutati. Verdict, severity, evidenza e raccomandazione per ognuno.
P0
Make hand-offs, approvals, and blockers explicit
Approval handoffs are explicit through `ApprovalEnvelope`, HMAC verification, role checks, and `AWAITING_APPROVAL`, and mock responses are no longer auto-promoted. The external bank handoff is still unsafe: `_submit_one()` records `task.submitted` before the bank call has succeeded; `_post_bank_transfer_live()` is deliberately unwired and raises `NotImplementedError` that is not converted into a workflow blocker; and any future non-mock response is treated as `SUCCEEDED` without typed confirmation. This can leave operators with a false or ambiguous payment state at the exact externally visible action boundary.
Raccomandazione
Architecturally separate bank authority into a real bank-transfer service/client with a typed result envelope. Only record `SUBMITTED` after bank acceptance, only record `SUCCEEDED` after explicit bank-confirmed success, and convert adapter-unavailable or ambiguous responses into `SUBMITTED_UNKNOWN`/`BLOCKED` with reconciliation instructions.
P0
Replace implied magic with clear mental models
The explicit `BANK_MODE` primitive and `mock=True` tagging improve the mental model, but two status semantics still imply more certainty than the system has. First, `_submit_one()` marks any non-mock response as `TaskStatus.SUCCEEDED` without checking a typed success/settlement field, response status, or transfer acceptance contract. Second, `confirm_mock_simulation()` promotes a mock task to `TaskStatus.SUCCEEDED`, while `inspect_run()` exposes only `status: "succeeded"`, `transfer_id`, and no `simulated`/`mock` flag. A downstream reviewer can confuse a simulated success or an unvalidated bank response with real payment completion.
Raccomandazione
Use distinct operational semantics: add `SIMULATED_SUCCEEDED` or a required `simulated: bool` task field exposed in `inspect_run()`, and require a typed bank response such as `TransferConfirmed` before assigning real `SUCCEEDED`. Do not let an arbitrary non-mock dict imply payment success.
P0
Ensure that background work remains perceptible
Most background work is perceptible through `RunStatus`, `TaskStatus`, `RunLedger`, `DurableOperatorInbox`, and `inspect_run()`. But the live bank path breaks perceptibility: `_submit_one()` sets `task.status = TaskStatus.SUBMITTED` and appends `event_type="task.submitted"` before calling `transfer_funds()`, while `_post_bank_transfer_live()` raises `NotImplementedError` and `_submit_one()` catches only `BankAPIError`. A live-mode run can therefore crash with the run left `EXECUTING` and the task recorded as `SUBMITTED` without a durable failure alert or blocker. Delta: despite the prior aligned baseline, this current submission exposes a concrete unhandled live-adapter path.
Raccomandazione
Move the bank handoff behind a typed adapter/service boundary that returns explicit `accepted`, `confirmed`, `rejected`, or `unknown` results; preflight that the live adapter is wired before recording `task.submitted`; and catch all adapter exceptions into a durable `task.submission_blocked` or `task.submitted_unknown` ledger event plus critical inbox alert.
P0
Align feedback with the user’s level of attention
The code does escalate many attention-worthy states: `inbox.post()` is used for forged approvals, expired envelopes, insufficient role, bank `auth_missing`, policy failures, unfinished tasks, and mock submissions. However, the highest-risk live-adapter failure is not converted into an operator-facing alert because `NotImplementedError` from `_post_bank_transfer_live()` is uncaught after `task.submitted` is written. The operator is not given a calibrated foreground/background signal that the bank client is unavailable and no real submission occurred.
Raccomandazione
Handle live-adapter-unavailable and unexpected bank-client exceptions as first-class workflow states with a critical inbox message that says exactly what happened, whether the bank may have seen the request, and the next safe action: wire live client, reconcile, retry, or cancel.
P0
Establish trust through inspectability
There is a strong inspectability base: `Invoice.snapshot_hash()`, `Policy.hash()`, HMAC-bound `ApprovalEnvelope`, append-only `RunLedger`, `event_hash`/`prev_event_hash`, `verify_chain()`, and `inspect_run()` all support traceability. The audit chain is incomplete on important failure paths: `_post_bank_transfer_live()` can raise uncaught `NotImplementedError` after `task.submitted`, leaving no ledgered failure/blocker event; expired approval envelopes post an inbox alert but do not append a rejection event, and insufficient-role rejections also lack a ledger append. For a payment workflow, rejected approvals and bank adapter failures must be hash-chain visible, not only thrown or posted out…
Raccomandazione
Make every approval rejection and every bank-adapter exception append a typed ledger event before returning/raising. For production, separate the audit ledger from the execution process or anchor the hash-chain tail externally so the ledger is not merely a mutable local JSONL file.
P0
Expose meaningful operational state, not internal complexity
The enum model is generally user-relevant (`AWAITING_APPROVAL`, `APPROVED`, `SUBMITTED_UNKNOWN`, `SUCCEEDED`, `FAILED`, `PAUSED`, `CANCELLED`) and `inspect_run()` summarizes tasks by status. The operational state becomes misleading at the bank boundary: `task.submitted` is recorded before the adapter successfully accepts the submission, and mock-confirmed tasks are shown as ordinary `succeeded` tasks in `inspect_run()`. Those states do not reliably tell the operator whether money moved, a request was merely attempted, or a simulation was acknowledged.
Raccomandazione
Split submission and completion states into user-meaningful phases such as `SUBMISSION_ATTEMPTING`, `SUBMITTED_ACCEPTED`, `SUBMITTED_UNKNOWN`, `CONFIRMED_PAID`, and `SIMULATED_SUCCEEDED`, and expose `bank_mode`/`simulated` in the primary inspection output.
P0
Design for delegation rather than direct manipulation
The workflow is designed around delegated intent and constraints rather than direct manual payment execution: `Policy` captures amount caps, vendor allow/block lists, due windows, and required approver role; `create_run()` establishes a delegated `run_id`; `draft_tasks()` turns invoices into governed `PaymentTask`s; `approve()` uses a signed `ApprovalEnvelope`; and `execute_approved_payments()` executes only approved work under policy checks. This maintains the prior aligned delegation structure.
P0
Apply progressive disclosure to system agency
The code uses reasonable progressive disclosure: `inspect_run()` provides a primary operational view with run status, policy hash, audit-chain integrity, critical alerts, summary counts, and per-task status; deeper details remain available through `RunLedger.read_run_events()` and `verify_chain()`. The primary view emphasizes outcome and required attention rather than dumping the raw JSONL ledger by default.
P0
Represent delegated work as a system, not merely as a conversation
Delegated work is represented as a structured system rather than a conversation: `PaymentRun` and `PaymentTask` model lifecycle state, `RunStatus`/`TaskStatus` encode progress, `RunLedger` provides a timeline, `DurableOperatorInbox` captures intervention-required alerts, and `inspect_run()` returns a task/status view with summary counts and reconciliation needs. This maintains the prior aligned system representation.
P0
Optimise for steering, not only initiating
The workflow includes steering primitives beyond initiation: `pause_run()`, `resume_run()`, `cancel_run()`, `resume_after_failure()`, `retry_failed_task()`, `reconcile_submitted_task()`, and `confirm_mock_simulation()` allow operators to interrupt, resume, retry, reconcile, or explicitly acknowledge mock submissions. `execute_approved_payments()` checks `ledger.latest_steering_intent(run_id)` between tasks, so ledgered pause/cancel intent can steer an in-progress run at task boundaries.
Aggiungi al tuo README
Due varianti embeddabili: una piccola e una a card più ricca.
Score card (consigliata)
[](https://aidesignblueprint.com/en/readiness-review/b1195c34-8d8a-495e-a91a-d30ed551ecc3)
Badge piatto
[](https://aidesignblueprint.com/en/readiness-review/b1195c34-8d8a-495e-a91a-d30ed551ecc3)
Delta iterazione
Regressioni (6)
Run ID: b1195c34-8d8a-495e-a91a-d30ed551ecc3 · Results expire after 90 days
Run by agents. Governed by humans. Validated by the AI Design Blueprint.