Vai al contenuto principaleVai al footer
Reviewed

Valutazione completata; in attesa di revisione delle prove.

Agent Architecture Review, Snapshot di validazione

Valutato il 10 maggio 2026 rispetto alla doctrine di AI Design Blueprint

Emerging

Stato: High Risk

60/100

Voto C

6 allineati4 blocker produzione2 alto rischio
Verdetti per principio

The submission adds strong delegation primitives: policy caps, HMAC approval envelopes, durable ledger/inbox records, task/run lifecycle states, and improved exposure accounting for SUBMITTED_UNKNOWN. However, SUBMITTED_UNKNOWN is only a blocker within the current execute pass; a later execute_approved_payments() call can still submit remaining APPROVED tasks before reconciliation, and cancellation can become terminal while bank exposure remains unresolved. Those are production-blocking handoff and steering failures for a payment workflow.

Storico iterazioni

5 run precedenti su questo artefatto. Ogni run_id apre la sua readiness review.

QuandoScoreStatoRun ID
10 maggio 2026 (questa run)60 / CHigh Risk039064f9
10 mag 202630 / FHigh Risk5c1d5833
10 mag 202640 / DHigh Riskb1195c34
10 mag 202674 / CAligneddd3a9348
10 mag 202674 / CAlignedb4799966
10 mag 202674 / CAligned7e9bc0f6

Findings per principio

10 principi valutati. Verdict, severity, evidenza e raccomandazione per ognuno.

P0

Make hand-offs, approvals, and blockers explicit

high riskproduction blocker90/100

The hard handoff blocker is not durable. In the current pass, execute_approved_payments() breaks when _submit_one() leaves a task SUBMITTED_UNKNOWN and posts run.execution_blocked_on_ambiguous_handoff. But there is no pre-execution guard for an already-existing SUBMITTED_UNKNOWN task. A subsequent execute_approved_payments() call will skip the unknown task, compute _projected_exposure_pence(), and submit another APPROVED task if the run cap still has room. That means unresolved bank exposure does not reliably block further external bank calls. Delta: the code improves the prior P8 finding by counting ambiguous exposure and halting within one pass, but the prior high-risk blocker persists acr…

Raccomandazione

Move the blocker to a durable run-level preflight before setting EXECUTING or calling transfer_funds(): if any task is SUBMITTED_UNKNOWN, append a typed blocker event, post/retain a critical alert, and return or raise without submitting any APPROVED task. Only reconcile_submitted_task(), confirm_mock_simulation(), or an explicit safe cancellation/reconciliation path should clear that blocker.

P0

Optimise for steering, not only initiating

high riskproduction blocker86/100

The workflow has steering primitives, but they are not safe enough around in-flight bank exposure. cancel_run() improves the no-in-flight case by transitioning to CANCELLED immediately, but execute_approved_payments() later sets run.status = RunStatus.CANCELLED whenever run.cancellation_requested is true, before considering unfinished_submitted or unfinished_submitted_unknown. That can make a run terminal while money-at-risk still requires reconciliation; because _reconcile_run_status() returns early for terminal CANCELLED runs, later reconciliation will not repair the top-level status. Combined with the missing durable SUBMITTED_UNKNOWN execution guard, an operator can steer through or hide…

Raccomandazione

Make steering projection-based and nonterminal while bank exposure exists: cancellation should become CANCELLATION_PENDING_RECONCILIATION when any task is SUBMITTED or SUBMITTED_UNKNOWN, and reconciliation should be allowed to transition the run to the truthful final state after exposure is resolved.

P0

Expose meaningful operational state, not internal complexity

needs changesproduction blocker72/100

Task-level states are meaningful, but the run-level state can misrepresent the dominant operational condition. After an ambiguous handoff, execute_approved_payments() sets run.status = RunStatus.AWAITING_APPROVAL whenever actionable_total > 0, even if the key action is bank reconciliation. Separately, the close path checks if run.cancellation_requested before actionable_total and can set RunStatus.CANCELLED while unfinished_submitted_unknown remains nonzero; _reconcile_run_status() then refuses to update terminal runs. The summary.needs_reconciliation field mitigates this, but the primary state can still hide money-at-risk.

Raccomandazione

Derive the primary run status from the highest-risk task projection: SUBMITTED_UNKNOWN/SUBMITTED should dominate AWAITING_APPROVAL and CANCELLED. Use explicit user-facing states such as BLOCKED_ON_RECONCILIATION or CANCELLATION_PENDING_RECONCILIATION until bank exposure is resolved.

P0

Replace implied magic with clear mental models

needs changesproduction blocker70/100

The mental model presented to the operator is stronger than the actual enforcement. The critical alert in execute_approved_payments() says no further approved tasks are submitted until reconcile_submitted_task() resolves SUBMITTED_UNKNOWN, but the method only sets ambiguous_halt when the current _submit_one() call creates SUBMITTED_UNKNOWN. On a later execute_approved_payments() call, an already-existing SUBMITTED_UNKNOWN is counted only in _projected_exposure_pence(); if current_exposure + the next APPROVED invoice is within max_run_total_pence, _submit_one() can still call transfer_funds(). RunStatus.AWAITING_APPROVAL also suggests an approval wait rather than a reconciliation blocker. Del…

Raccomandazione

Make SUBMITTED_UNKNOWN a durable execution-blocking state, not a per-loop note: before any transfer_funds() call, derive the run projection and refuse execution while any task is SUBMITTED_UNKNOWN, returning a user-facing blocked-on-reconciliation state and required action.

P0

Design for delegation rather than direct manipulation

aligned

PaymentWorkflow models delegated work through create_run(), draft_tasks(), approve(), execute_approved_payments(), pause_run(), resume_run(), and cancel_run(). Policy defines explicit authority constraints such as max_run_total_pence, max_invoice_amount_pence, allow/block lists, due_window_days, and required_approver_role; approve() verifies an HMAC ApprovalEnvelope plus policy_hash and invoice_snapshot_hash; execute_approved_payments() enforces caps using _projected_exposure_pence() over SUBMITTED, SUBMITTED_UNKNOWN, SUCCEEDED, and SIMULATED_SUCCEEDED. Delta: this addresses the prior P1 exposure-under-counting finding; the remaining durable handoff blocker gap is scored under P8/P10.

P0

Ensure that background work remains perceptible

aligned

Background work is perceptible through explicit RunStatus and TaskStatus values, RunLedger JSONL audit events, DurableOperatorInbox alerts, inspect_run() summaries, and replay_run_from_ledger() for continuity after process loss. inspect_run() exposes run status, updated_at, critical_alerts, by_status counts, submitted_amount_pence, and needs_reconciliation.

P0

Align feedback with the user’s level of attention

aligned

Feedback is calibrated by risk: routine transitions are ledgered, policy/bank blockers post inbox alerts, ambiguous bank handoffs post critical alerts, rejected bank submissions use warning severity, and run not completed alerts include actionable counts for AWAITING_APPROVAL, APPROVED, SUBMITTED, and SUBMITTED_UNKNOWN. Delta: reconcile_submitted_task() now catches broad non-BankAPIError exceptions, appends task.reconcile_blocked, posts a critical alert, and keeps the task SUBMITTED_UNKNOWN, addressing the prior P3 feedback gap.

P0

Apply progressive disclosure to system agency

aligned

The code separates primary status from deeper inspection: inspect_run() returns top-level run_id/status/bank_mode/policy_hash/critical_alerts and a summary before per-task detail, while RunLedger.read_run_events() and verify_chain() provide deeper audit inspection when needed. Invoice snapshot hashes and approval envelope signatures are available for audit without making the ledger the only user-facing surface.

P0

Establish trust through inspectability

aligned

The workflow is inspectable through a tamper-evident RunLedger: AuditEvent includes sequence_no, prev_event_hash, event_hash, actor_user_id, payload, and created_at; verify_chain() detects divergence. Ledger events cover policy creation, task drafting/skipping, approvals with envelope metadata, submissions, ambiguous outcomes such as task.submitted_unknown_after_send and task.submission_blocked, reconciliation failures via task.reconcile_blocked, and terminal run transitions. Delta: the prior P7 recommendation is materially addressed by typed reconciliation boundary events and projection-derived submitted_amount_pence.

P0

Represent delegated work as a system, not merely as a conversation

aligned

Delegated work is represented as a structured system rather than a conversation: PaymentRun, PaymentTask, Policy, ApprovalEnvelope, AuditEvent, RunLedger, DurableOperatorInbox, and inspect_run() form explicit layers for work items, status, approvals, audit, alerts, and operator inspection. replay_run_from_ledger() reconstructs state from the event stream and promotes stranded SUBMITTED tasks to SUBMITTED_UNKNOWN for reconciliation.

Aggiungi al tuo README

Due varianti embeddabili: una piccola e una a card più ricca.

Score card (consigliata)

Blueprint Readiness Score card
[![Blueprint Readiness Score card](https://aidesignblueprint.com/api/badge/run/039064f9-c4e9-45de-8c16-1c160a01fca1/card.svg)](https://aidesignblueprint.com/en/readiness-review/039064f9-c4e9-45de-8c16-1c160a01fca1)

Badge piatto

Blueprint Readiness Score badge
[![Blueprint Readiness Score](https://aidesignblueprint.com/api/badge/run/039064f9-c4e9-45de-8c16-1c160a01fca1.svg)](https://aidesignblueprint.com/en/readiness-review/039064f9-c4e9-45de-8c16-1c160a01fca1)
Dettagli baseline e iterazione
Baseline: usedDoctrine: same doctrineRace: checked clear

Delta iterazione

Regressioni (3)

P5Replace implied magic with clear mental modelsneeds_changesneeds_changes
P6Expose meaningful operational state, not internal complexityneeds_changesneeds_changes
P10Optimise for steering, not only initiatingneeds_changeshigh_risk

Miglioramenti (4)

P1Design for delegation rather than direct manipulationhigh_riskaligned
P3Align feedback with the user’s level of attentionneeds_changesaligned
P7Establish trust through inspectabilityneeds_changesaligned
P8Make hand-offs, approvals, and blockers explicithigh_riskhigh_risk
Rubric: 2026-05-04

Run ID: 039064f9-c4e9-45de-8c16-1c160a01fca1 · Results expire after 90 days

Run by agents. Governed by humans. Validated by the AI Design Blueprint.