Valutazione completata; in attesa di revisione delle prove.
Valutato il 10 maggio 2026 rispetto alla doctrine di AI Design Blueprint
High Risk
Stato: High Risk
30/100
Voto F
The submission is a substantial improvement over the prior single-shot payment loop: it introduces runs, tasks, explicit policy objects, task states, approval records, cancellation checks, and an audit ledger. However, the production trust boundary still fails for irreversible payments because approval authority is self-declared, the cron path auto-approves, key policy constraints are not enforced, state/alerts remain process-local, and the ledger is not tamper-evident.
Storico iterazioni
1 run precedenti su questo artefatto. Ogni run_id apre la sua readiness review.
| Quando | Score | Stato | Run ID |
|---|---|---|---|
| 10 maggio 2026 (questa run) | 30 / F | High Risk | b8d61c00… |
| 10 mag 2026 | 0 / F | High Risk | ac64d7d9… |
Findings per principio
10 principi valutati. Verdict, severity, evidenza e raccomandazione per ognuno.
P0
Make hand-offs, approvals, and blockers explicit
There is a visible approval state: `_submit_one()` only runs for `TaskStatus.APPROVED`, and `approve()` records `approver_user_id`, `approved_policy_hash`, and `approved_at`. However, the approval boundary is not trustworthy for irreversible bank transfers. `approve()` accepts any caller-supplied `approver_user_id` and only checks a self-reported `approver_seen_policy_hash`; it never authenticates the actor or validates `Policy.required_approver_role`. Worse, `run_daily_batch()` auto-approves every `TaskStatus.AWAITING_APPROVAL` task in-process using the default `approver_user_id='cfo@example.com'`, so the cron path can call `transfer_funds()` without an actual human approval. Delta: the pri…
Raccomandazione
Separate approval from execution. Remove in-process auto-approval from `run_daily_batch()`, require an authenticated approval service to issue a signed approval token bound to `run_id`, `task_id`, invoice snapshot hash, amount, and `policy_hash`, and have the executor accept only those signed approvals from principals with the required role.
P0
Replace implied magic with clear mental models
The `Policy` dataclass gives the user a clear intended mental model, but the implementation violates that model in load-bearing ways. `max_run_total_pence` is documented as a run-level cap but is never checked; `required_approver_role` is stored but never enforced; `draft_tasks()` can classify invoices using a different `policy` from the one captured in `run.policy_hash`; and audit payloads then record `run.policy_hash`, which can make the decision appear authorised by a policy that was not actually used for classification. Delta: the explicit policy object improves the prior implied magic, but unenforced policy fields create a more dangerous false sense of control.
Raccomandazione
Convert `Policy` from descriptive configuration into enforced authority. Store the exact policy object on the run, classify only against that stored policy, enforce every declared constraint in code, and fail closed if any policy field is not enforceable. Remove or rename fields that are not actually honoured.
P0
Establish trust through inspectability
`RunLedger.append()` writes structured `AuditEvent` rows for run creation, approvals, submissions, successes, failures, cancellations, invoice snapshots, policy decisions, idempotency keys, and bank responses. That is materially more inspectable than the prior loop. But the ledger is only JSONL append-by-convention: there is no hash chain, HMAC/signature, immutable storage, sequence number, or integrity check, despite comments describing it as tamper-evident. It can also log a misleading `policy_hash` if `draft_tasks()` receives a different policy from the run policy. Delta: audit coverage improved, but audit integrity and policy-decision provenance remain insufficient for payment accountabi…
Raccomandazione
Move the audit ledger outside the execution loop into an append-only durable store with tamper evidence, such as a hash-chained event table or signed ledger. Include previous-event hash, sequence number, actor, exact policy hash used for classification, invoice snapshot hash, approval witness, transfer request hash, and response hash.
P0
Ensure that background work remains perceptible
The workflow has visible states via `RunStatus`, `TaskStatus`, and JSONL `AuditEvent`s, and `_submit_one()` records `task.submitted` before calling the bank. However, operational continuity is still process-local: `PaymentWorkflow.runs` and `PaymentWorkflow.tasks` are in-memory dictionaries, `inspect_run()` only reads those projections, and there is no replay path from `RunLedger.read_run_events()` after restart. A crash after `task.submitted` but before `task.succeeded`/`task.failed` would leave the persisted ledger inspectable only manually, not as a resumable workflow. Delta: this is a major improvement over the prior opaque loop, but persistence is not yet a usable lifecycle primitive.
Raccomandazione
Move run/task projections and execution cursor into a durable store or implement deterministic ledger replay before `inspect_run()` and `execute_approved()`. Treat `submitted_without_response` as an explicit recoverable state and persist cancellation flags outside the process.
P0
Design for delegation rather than direct manipulation
The code now models delegated work with `PaymentRun`, `PaymentTask`, and a `Policy` containing amount limits, vendor lists, due-window constraints, and approver-role intent. But the delegation contract is not yet authoritative: `draft_tasks()` accepts an arbitrary `policy` argument without checking `policy.hash() == run.policy_hash`, `Policy.max_run_total_pence` is never enforced, and `Policy.required_approver_role` is never checked. Delta: this improves the prior run/task/policy gap, but the current policy object is still partly advisory rather than a hard boundary.
Raccomandazione
Make the run-owned policy the single source of authority: persist the policy or signed policy envelope with the run, reject drafting/execution if the supplied policy hash differs from `run.policy_hash`, and enforce run-total, invoice, vendor, due-window, and approver-role constraints before any task can become executable.
P0
Optimise for steering, not only initiating
The workflow has an initial steering primitive: `cancel_run()` sets `cancellation_requested`, marks not-yet-submitted tasks `CANCELLED`, and `execute_approved()` checks the flag before each call to `transfer_funds()`. Idempotency keys also reduce duplicate-payment risk. But steering is incomplete and partly non-durable: the cancellation flag is only in memory, there is no pause/resume or reprioritisation path, failed tasks cannot actually be retried even though the inbox message says to retry from the operator inbox, and `execute_approved()` ignores `TaskStatus.FAILED` rather than offering a controlled replay with the same idempotency key. Delta: this improves the prior no-interrupt design,…
Raccomandazione
Persist steering state and execution cursor outside the process, then add explicit lifecycle operations for pause, resume, cancel, and retry/requeue of failed or submitted-unknown tasks using the original idempotency key. Keep the worker separate from the approval and steering surfaces so operators can intervene safely between external actions.
P0
Align feedback with the user’s level of attention
The code separates routine progress from intervention paths by recording normal transitions in `_record()` and escalating `BankAPIError` plus policy-hash mismatch through `OperatorInbox.post()`. But `OperatorInbox` is only an in-memory `list`, and `run_daily_batch()` creates it locally and discards it, so a critical `auth_missing` alert may never reach an operator in a cron/background context. Delta: this addresses the prior swallowed-exception pattern, but the escalation channel is still not durable or operator-facing.
Raccomandazione
Move `OperatorInbox` behind a durable operator-facing notification surface, such as a database-backed inbox, dashboard, or incident channel. Return or persist alert IDs from `run_daily_batch()` so critical failures cannot disappear with the process.
P0
Apply progressive disclosure to system agency
`inspect_run()` provides a concise top-level run view with `run_id`, `status`, `policy_hash`, cancellation fields, and a `summary` grouped by task status, then exposes per-task drill-down including `policy_decision`, invoice snapshot, approval identity/time, transfer ID, and failure details. Delta: this improves the prior lack of progressive inspection with a clear summary-plus-detail structure.
P0
Expose meaningful operational state, not internal complexity
The code exposes user-relevant lifecycle states through `RunStatus` values such as `awaiting_approval`, `executing`, `partially_completed`, and `failed`, plus `TaskStatus` values such as `awaiting_approval`, `approved`, `submitted`, `succeeded`, `failed`, `skipped`, and `cancelled`. Technical details are kept in diagnostic fields like `failure_class`, `failure_message`, and `bank_response` rather than replacing the operational state. Delta: this addresses the prior single-shot loop by making payment progress understandable at the run and task levels.
P0
Represent delegated work as a system, not merely as a conversation
The workflow is represented as a structured system rather than a conversation: `PaymentRun` owns child `PaymentTask`s, `RunStatus` and `TaskStatus` encode lifecycle, `_record()` emits timeline events, `execute_approved()` derives final run outcome from task states, and `inspect_run()` shows the run/task hierarchy. Delta: this resolves the prior lack of a system representation, although separate durability concerns are covered under P2 and P7.
Aggiungi al tuo README
Due varianti embeddabili: una piccola e una a card più ricca.
Score card (consigliata)
[](https://aidesignblueprint.com/en/readiness-review/b8d61c00-2b86-45f0-a533-526202371592)
Badge piatto
[](https://aidesignblueprint.com/en/readiness-review/b8d61c00-2b86-45f0-a533-526202371592)
Delta iterazione
Miglioramenti (9)
Run ID: b8d61c00-2b86-45f0-a533-526202371592 · Results expire after 90 days
Run by agents. Governed by humans. Validated by the AI Design Blueprint.