Valutazione completata; in attesa di revisione delle prove.
Valutato il 10 maggio 2026 rispetto alla doctrine di AI Design Blueprint
Emerging
Stato: High Risk
60/100
Voto C
This is an autonomous payment workflow with meaningful delegation, policy, status, inbox, replay, and audit primitives. However, the irreversible bank-transfer boundary is still unsafe: approvals are accepted from raw caller-supplied identity/role fields rather than a verifiable approval authority, and recovery/steering around submitted or failed payments can strand runs.
Storico iterazioni
2 run precedenti su questo artefatto. Ogni run_id apre la sua readiness review.
Findings per principio
10 principi valutati. Verdict, severity, evidenza e raccomandazione per ognuno.
P0
Make hand-offs, approvals, and blockers explicit
The code correctly separates drafting from execution: `cron_draft_daily_run()` only drafts, and `execute_approved_payments()` only submits `TaskStatus.APPROVED` tasks. However, the approval gate before an irreversible bank transfer is not trustworthy because `approve()` accepts `approver_user_id`, `approver_role`, and `approver_seen_policy_hash` as ordinary caller-provided strings. The comment says the caller must authenticate before calling, but the workflow itself does not verify a signed token or approval record. Any caller with access to this method can pass `approver_role='cfo'` and create a `task.approved` ledger event. Delta: this improves the prior P8 high-risk finding by removing in…
Raccomandazione
Put approval authority outside the execution object: require `approve()` to consume and verify a signed approval token or persisted approval record from an authenticated approval service, bound to the approver identity, role, `run_id`, `task_id`, `policy_hash`, and `invoice_snapshot_hash`, before setting `TaskStatus.APPROVED`.
P0
Optimise for steering, not only initiating
The code adds steering methods, but several are not yet reliable at the payment boundary. `retry_failed_task()` changes a failed task back to `APPROVED`, yet `execute_approved_payments()` refuses to run when the run is already `FAILED` or `PARTIALLY_COMPLETED`, which is the normal state after execution failures. Pause/cancel checks inside `execute_approved_payments()` read only the in-memory `run.pause_requested`/`run.cancellation_requested` flags, so a separate operator process appending pause/cancel intent to the ledger would not interrupt an already-running executor. There is also no steering path for `TaskStatus.SUBMITTED` after a crash around `transfer_funds()`. Delta: this improves the…
Raccomandazione
Move steering state and execution cursor into a transactional store that the executor re-reads before each external submit; allow resumed execution from `FAILED`, `PARTIALLY_COMPLETED`, and `SUBMITTED_UNKNOWN` states through explicit reconciliation/requeue transitions that preserve the original `idempotency_key`.
P0
Ensure that background work remains perceptible
The code adds perceptible state via `RunStatus`, `TaskStatus`, `inspect_run()`, JSONL `RunLedger`, and `replay_run_from_ledger()`, but it can still strand background work around the irreversible side-effect boundary. `_submit_one()` appends `task.submitted` before calling `transfer_funds()`; if the process dies during or after the bank call but before `task.succeeded`/`task.failed`, replay leaves the task in `TaskStatus.SUBMITTED`, `execute_approved_payments()` skips non-`APPROVED` tasks, and `retry_failed_task()` only handles `FAILED` tasks. The user can return to a durable but unresolved `submitted`/`executing` state with no reconciliation path. Delta: the current code improves the prior d…
Raccomandazione
Move execution state and the bank submission cursor into a transactional durable store, and model `SUBMITTED_UNKNOWN`/reconciliation as a first-class state that can be safely resumed using the existing `idempotency_key` before any further payments are attempted.
P0
Replace implied magic with clear mental models
Most policy fields are now load-bearing: `classify_invoice()` enforces vendor blocklist, due window, amount range, allowlist, and per-invoice cap; `draft_tasks()` rejects a mismatched `policy.hash()`; `execute_approved_payments()` enforces `max_run_total_pence`; and `approve()` checks `approver_seen_policy_hash`. The remaining mental-model failure is that `Policy.required_approver_role` appears to authorize payment approval, but `approve()` merely compares it with caller-supplied `approver_role`; there is no signed approval envelope or identity-service proof that `approver_user_id` actually has that role. Delta: this improves the prior high-risk policy-enforcement finding, but the approval-a…
Raccomandazione
Replace raw `approver_user_id`/`approver_role` parameters with a verified approval envelope issued by the identity/approval service and bound to `run_id`, `task_id`, `policy_hash`, and `invoice_snapshot_hash`; make that envelope the only source of approval authority.
P0
Design for delegation rather than direct manipulation
The workflow is structured around delegated work rather than manual step execution: `Policy` captures operator constraints, `create_run()` binds a `policy_hash`, `draft_tasks()` classifies invoices, and `execute_approved_payments()` submits only approved tasks. Operators have explicit controls through `approve()`, `pause_run()`, `resume_run()`, `cancel_run()`, and `retry_failed_task()`. Delta: this improves the prior P1 finding by making the run-owned policy hash load-bearing in drafting and audit events.
P0
Align feedback with the user’s level of attention
Feedback is proportionate and durable: `DurableOperatorInbox.post()` persists intervention alerts to JSONL, bank and policy failures create warning/critical alerts, and `inspect_run()` surfaces `critical_alerts` plus task-level `failure_class` and `failure_message`. Routine state is summarized through `summary.by_status`, while attention-required failures carry alert IDs. Delta: this addresses the prior P3 recommendation by replacing an in-memory inbox with a durable operator-facing inbox.
P0
Apply progressive disclosure to system agency
`inspect_run()` separates a primary operational summary (`status`, `policy_hash`, `audit_chain_intact`, `critical_alerts`, and `summary.by_status`) from per-task details such as invoice snapshot hashes, approver fields, transfer IDs, and failure messages. The code therefore supports a default overview with deeper inspection when needed. Delta: this maintains the prior aligned P4 result.
P0
Expose meaningful operational state, not internal complexity
The workflow exposes user-relevant states through `RunStatus` values such as `AWAITING_APPROVAL`, `PAUSED`, `EXECUTING`, `PARTIALLY_COMPLETED`, `CANCELLED`, and `FAILED`, plus `TaskStatus` values such as `APPROVED`, `SUBMITTED`, `SUCCEEDED`, `FAILED`, and `SKIPPED`. `inspect_run()` presents these states with counts and actionable task failure fields rather than only low-level log entries. Delta: this maintains the prior aligned P6 result.
P0
Establish trust through inspectability
The audit path includes a real inspectability primitive: every `AuditEvent` carries `sequence_no`, `prev_event_hash`, and `event_hash`; `_hash_event_body()` canonicalizes the event body; `RunLedger.append()` chains per-run events; and `verify_chain()` detects sequence, previous-hash, or body-hash divergence. Events include policy hashes, invoice snapshots, approvals, submissions, bank responses, failures, cancellations, and retries, and `inspect_run()` reports `audit_chain_intact` plus the first divergent event ID. Delta: this addresses the prior P7 recommendation by adding a hash-chained ledger and verification path.
P0
Represent delegated work as a system, not merely as a conversation
Delegated work is represented as a structured system: `PaymentRun` owns the lifecycle, `PaymentTask` models per-invoice work, `RunLedger` records ordered events, `DurableOperatorInbox` records intervention alerts, and `inspect_run()` returns a task list plus aggregate status counts. Execution state is separate from conversational or prompt-like input. Delta: this maintains the prior aligned P9 result.
Aggiungi al tuo README
Due varianti embeddabili: una piccola e una a card più ricca.
Score card (consigliata)
[](https://aidesignblueprint.com/en/readiness-review/15aa9649-84d8-4a75-aec7-fd101a5b0535)
Badge piatto
[](https://aidesignblueprint.com/en/readiness-review/15aa9649-84d8-4a75-aec7-fd101a5b0535)
Delta iterazione
Regressioni (2)
Miglioramenti (5)
Run ID: 15aa9649-84d8-4a75-aec7-fd101a5b0535 · Results expire after 90 days
Run by agents. Governed by humans. Validated by the AI Design Blueprint.