Valutazione completata; in attesa di revisione delle prove.
Valutato il 10 maggio 2026 rispetto alla doctrine di AI Design Blueprint
Emerging
Stato: High Risk
60/100
Voto C
The submission adds strong delegation primitives: policy caps, HMAC approval envelopes, durable ledger/inbox records, task/run lifecycle states, and improved exposure accounting for SUBMITTED_UNKNOWN. However, SUBMITTED_UNKNOWN is only a blocker within the current execute pass; a later execute_approved_payments() call can still submit remaining APPROVED tasks before reconciliation, and cancellation can become terminal while bank exposure remains unresolved. Those are production-blocking handoff and steering failures for a payment workflow.
Storico iterazioni
5 run precedenti su questo artefatto. Ogni run_id apre la sua readiness review.
Findings per principio
10 principi valutati. Verdict, severity, evidenza e raccomandazione per ognuno.
P0
Make hand-offs, approvals, and blockers explicit
The hard handoff blocker is not durable. In the current pass, execute_approved_payments() breaks when _submit_one() leaves a task SUBMITTED_UNKNOWN and posts run.execution_blocked_on_ambiguous_handoff. But there is no pre-execution guard for an already-existing SUBMITTED_UNKNOWN task. A subsequent execute_approved_payments() call will skip the unknown task, compute _projected_exposure_pence(), and submit another APPROVED task if the run cap still has room. That means unresolved bank exposure does not reliably block further external bank calls. Delta: the code improves the prior P8 finding by counting ambiguous exposure and halting within one pass, but the prior high-risk blocker persists acr…
Raccomandazione
Move the blocker to a durable run-level preflight before setting EXECUTING or calling transfer_funds(): if any task is SUBMITTED_UNKNOWN, append a typed blocker event, post/retain a critical alert, and return or raise without submitting any APPROVED task. Only reconcile_submitted_task(), confirm_mock_simulation(), or an explicit safe cancellation/reconciliation path should clear that blocker.
P0
Optimise for steering, not only initiating
The workflow has steering primitives, but they are not safe enough around in-flight bank exposure. cancel_run() improves the no-in-flight case by transitioning to CANCELLED immediately, but execute_approved_payments() later sets run.status = RunStatus.CANCELLED whenever run.cancellation_requested is true, before considering unfinished_submitted or unfinished_submitted_unknown. That can make a run terminal while money-at-risk still requires reconciliation; because _reconcile_run_status() returns early for terminal CANCELLED runs, later reconciliation will not repair the top-level status. Combined with the missing durable SUBMITTED_UNKNOWN execution guard, an operator can steer through or hide…
Raccomandazione
Make steering projection-based and nonterminal while bank exposure exists: cancellation should become CANCELLATION_PENDING_RECONCILIATION when any task is SUBMITTED or SUBMITTED_UNKNOWN, and reconciliation should be allowed to transition the run to the truthful final state after exposure is resolved.
P0
Expose meaningful operational state, not internal complexity
Task-level states are meaningful, but the run-level state can misrepresent the dominant operational condition. After an ambiguous handoff, execute_approved_payments() sets run.status = RunStatus.AWAITING_APPROVAL whenever actionable_total > 0, even if the key action is bank reconciliation. Separately, the close path checks if run.cancellation_requested before actionable_total and can set RunStatus.CANCELLED while unfinished_submitted_unknown remains nonzero; _reconcile_run_status() then refuses to update terminal runs. The summary.needs_reconciliation field mitigates this, but the primary state can still hide money-at-risk.
Raccomandazione
Derive the primary run status from the highest-risk task projection: SUBMITTED_UNKNOWN/SUBMITTED should dominate AWAITING_APPROVAL and CANCELLED. Use explicit user-facing states such as BLOCKED_ON_RECONCILIATION or CANCELLATION_PENDING_RECONCILIATION until bank exposure is resolved.
P0
Replace implied magic with clear mental models
The mental model presented to the operator is stronger than the actual enforcement. The critical alert in execute_approved_payments() says no further approved tasks are submitted until reconcile_submitted_task() resolves SUBMITTED_UNKNOWN, but the method only sets ambiguous_halt when the current _submit_one() call creates SUBMITTED_UNKNOWN. On a later execute_approved_payments() call, an already-existing SUBMITTED_UNKNOWN is counted only in _projected_exposure_pence(); if current_exposure + the next APPROVED invoice is within max_run_total_pence, _submit_one() can still call transfer_funds(). RunStatus.AWAITING_APPROVAL also suggests an approval wait rather than a reconciliation blocker. Del…
Raccomandazione
Make SUBMITTED_UNKNOWN a durable execution-blocking state, not a per-loop note: before any transfer_funds() call, derive the run projection and refuse execution while any task is SUBMITTED_UNKNOWN, returning a user-facing blocked-on-reconciliation state and required action.
P0
Design for delegation rather than direct manipulation
PaymentWorkflow models delegated work through create_run(), draft_tasks(), approve(), execute_approved_payments(), pause_run(), resume_run(), and cancel_run(). Policy defines explicit authority constraints such as max_run_total_pence, max_invoice_amount_pence, allow/block lists, due_window_days, and required_approver_role; approve() verifies an HMAC ApprovalEnvelope plus policy_hash and invoice_snapshot_hash; execute_approved_payments() enforces caps using _projected_exposure_pence() over SUBMITTED, SUBMITTED_UNKNOWN, SUCCEEDED, and SIMULATED_SUCCEEDED. Delta: this addresses the prior P1 exposure-under-counting finding; the remaining durable handoff blocker gap is scored under P8/P10.
P0
Ensure that background work remains perceptible
Background work is perceptible through explicit RunStatus and TaskStatus values, RunLedger JSONL audit events, DurableOperatorInbox alerts, inspect_run() summaries, and replay_run_from_ledger() for continuity after process loss. inspect_run() exposes run status, updated_at, critical_alerts, by_status counts, submitted_amount_pence, and needs_reconciliation.
P0
Align feedback with the user’s level of attention
Feedback is calibrated by risk: routine transitions are ledgered, policy/bank blockers post inbox alerts, ambiguous bank handoffs post critical alerts, rejected bank submissions use warning severity, and run not completed alerts include actionable counts for AWAITING_APPROVAL, APPROVED, SUBMITTED, and SUBMITTED_UNKNOWN. Delta: reconcile_submitted_task() now catches broad non-BankAPIError exceptions, appends task.reconcile_blocked, posts a critical alert, and keeps the task SUBMITTED_UNKNOWN, addressing the prior P3 feedback gap.
P0
Apply progressive disclosure to system agency
The code separates primary status from deeper inspection: inspect_run() returns top-level run_id/status/bank_mode/policy_hash/critical_alerts and a summary before per-task detail, while RunLedger.read_run_events() and verify_chain() provide deeper audit inspection when needed. Invoice snapshot hashes and approval envelope signatures are available for audit without making the ledger the only user-facing surface.
P0
Establish trust through inspectability
The workflow is inspectable through a tamper-evident RunLedger: AuditEvent includes sequence_no, prev_event_hash, event_hash, actor_user_id, payload, and created_at; verify_chain() detects divergence. Ledger events cover policy creation, task drafting/skipping, approvals with envelope metadata, submissions, ambiguous outcomes such as task.submitted_unknown_after_send and task.submission_blocked, reconciliation failures via task.reconcile_blocked, and terminal run transitions. Delta: the prior P7 recommendation is materially addressed by typed reconciliation boundary events and projection-derived submitted_amount_pence.
P0
Represent delegated work as a system, not merely as a conversation
Delegated work is represented as a structured system rather than a conversation: PaymentRun, PaymentTask, Policy, ApprovalEnvelope, AuditEvent, RunLedger, DurableOperatorInbox, and inspect_run() form explicit layers for work items, status, approvals, audit, alerts, and operator inspection. replay_run_from_ledger() reconstructs state from the event stream and promotes stranded SUBMITTED tasks to SUBMITTED_UNKNOWN for reconciliation.
Aggiungi al tuo README
Due varianti embeddabili: una piccola e una a card più ricca.
Score card (consigliata)
[](https://aidesignblueprint.com/en/readiness-review/039064f9-c4e9-45de-8c16-1c160a01fca1)
Badge piatto
[](https://aidesignblueprint.com/en/readiness-review/039064f9-c4e9-45de-8c16-1c160a01fca1)
Delta iterazione
Regressioni (3)
Miglioramenti (4)
Run ID: 039064f9-c4e9-45de-8c16-1c160a01fca1 · Results expire after 90 days
Run by agents. Governed by humans. Validated by the AI Design Blueprint.