Revisionato

Valutazione completata; in attesa di revisione delle prove.

Agent Architecture Review, Snapshot di validazione

Valutato il 10 maggio 2026 rispetto alla doctrine di AI Design Blueprint

Emergente

Stato: Alto rischio

60/100

Voto C

6 allineati4 blocker produzione2 alto rischio

Blueprint Readiness misura l'allineamento alla doctrine, non la runtime correctness. Un verdetto production-ready significa che l'architettura incarna i 10 principi; non esegue i tuoi test o i tuoi tipi. Affiancala alla tua test suite, non sostituirla.

Verdetti per principio

This is an autonomous payment workflow with meaningful delegation, policy, status, inbox, replay, and audit primitives. However, the irreversible bank-transfer boundary is still unsafe: approvals are accepted from raw caller-supplied identity/role fields rather than a verifiable approval authority, and recovery/steering around submitted or failed payments can strand runs.

Storico iterazioni

2 run precedenti su questo artefatto. Ogni run_id apre la sua readiness review.

I punteggi possono salire o scendere tra le iterazioni: il ragionamento del validator non è strettamente deterministico, quindi lo stesso artefatto può ottenere punteggi diversi tra una run e l'altra. I delta per principio più sotto mostrano il cambiamento sostanziale.

Quando	Score	Tier	Run ID
10 maggio 2026 (questa run)	60 / C	In sviluppo	15aa9649…
10 mag 2026	30 / F	Bozza	b8d61c00…
10 mag 2026	0 / F	Bozza	ac64d7d9…

Findings per principio

10 principi valutati. Verdict, severity, evidenza e raccomandazione per ognuno.

Alto rischioBlocker di produzione90/100

Make hand-offs, approvals, and blockers explicit

The code correctly separates drafting from execution: `cron_draft_daily_run()` only drafts, and `execute_approved_payments()` only submits `TaskStatus.APPROVED` tasks. However, the approval gate before an irreversible bank transfer is not trustworthy because `approve()` accepts `approver_user_id`, `approver_role`, and `approver_seen_policy_hash` as ordinary caller-provided strings. The comment says the caller must authenticate before calling, but the workflow itself does not verify a signed token or approval record. Any caller with access to this method can pass `approver_role='cfo'` and create a `task.approved` ledger event. Delta: this improves the prior P8 high-risk finding by removing in…

Raccomandazione

Put approval authority outside the execution object: require `approve()` to consume and verify a signed approval token or persisted approval record from an authenticated approval service, bound to the approver identity, role, `run_id`, `task_id`, `policy_hash`, and `invoice_snapshot_hash`, before setting `TaskStatus.APPROVED`.

P10

Alto rischioBlocker di produzione84/100

Optimise for steering, not only initiating

The code adds steering methods, but several are not yet reliable at the payment boundary. `retry_failed_task()` changes a failed task back to `APPROVED`, yet `execute_approved_payments()` refuses to run when the run is already `FAILED` or `PARTIALLY_COMPLETED`, which is the normal state after execution failures. Pause/cancel checks inside `execute_approved_payments()` read only the in-memory `run.pause_requested`/`run.cancellation_requested` flags, so a separate operator process appending pause/cancel intent to the ledger would not interrupt an already-running executor. There is also no steering path for `TaskStatus.SUBMITTED` after a crash around `transfer_funds()`. Delta: this improves the…

Raccomandazione

Move steering state and execution cursor into a transactional store that the executor re-reads before each external submit; allow resumed execution from `FAILED`, `PARTIALLY_COMPLETED`, and `SUBMITTED_UNKNOWN` states through explicit reconciliation/requeue transitions that preserve the original `idempotency_key`.

Richiede modificheBlocker di produzione72/100

Ensure that background work remains perceptible

The code adds perceptible state via `RunStatus`, `TaskStatus`, `inspect_run()`, JSONL `RunLedger`, and `replay_run_from_ledger()`, but it can still strand background work around the irreversible side-effect boundary. `_submit_one()` appends `task.submitted` before calling `transfer_funds()`; if the process dies during or after the bank call but before `task.succeeded`/`task.failed`, replay leaves the task in `TaskStatus.SUBMITTED`, `execute_approved_payments()` skips non-`APPROVED` tasks, and `retry_failed_task()` only handles `FAILED` tasks. The user can return to a durable but unresolved `submitted`/`executing` state with no reconciliation path. Delta: the current code improves the prior d…

Raccomandazione

Move execution state and the bank submission cursor into a transactional durable store, and model `SUBMITTED_UNKNOWN`/reconciliation as a first-class state that can be safely resumed using the existing `idempotency_key` before any further payments are attempted.

Richiede modificheBlocker di produzione64/100

Replace implied magic with clear mental models

Most policy fields are now load-bearing: `classify_invoice()` enforces vendor blocklist, due window, amount range, allowlist, and per-invoice cap; `draft_tasks()` rejects a mismatched `policy.hash()`; `execute_approved_payments()` enforces `max_run_total_pence`; and `approve()` checks `approver_seen_policy_hash`. The remaining mental-model failure is that `Policy.required_approver_role` appears to authorize payment approval, but `approve()` merely compares it with caller-supplied `approver_role`; there is no signed approval envelope or identity-service proof that `approver_user_id` actually has that role. Delta: this improves the prior high-risk policy-enforcement finding, but the approval-a…

Raccomandazione

Replace raw `approver_user_id`/`approver_role` parameters with a verified approval envelope issued by the identity/approval service and bound to `run_id`, `task_id`, `policy_hash`, and `invoice_snapshot_hash`; make that envelope the only source of approval authority.

Allineato

Design for delegation rather than direct manipulation

The workflow is structured around delegated work rather than manual step execution: `Policy` captures operator constraints, `create_run()` binds a `policy_hash`, `draft_tasks()` classifies invoices, and `execute_approved_payments()` submits only approved tasks. Operators have explicit controls through `approve()`, `pause_run()`, `resume_run()`, `cancel_run()`, and `retry_failed_task()`. Delta: this improves the prior P1 finding by making the run-owned policy hash load-bearing in drafting and audit events.

Allineato

Align feedback with the user’s level of attention

Feedback is proportionate and durable: `DurableOperatorInbox.post()` persists intervention alerts to JSONL, bank and policy failures create warning/critical alerts, and `inspect_run()` surfaces `critical_alerts` plus task-level `failure_class` and `failure_message`. Routine state is summarized through `summary.by_status`, while attention-required failures carry alert IDs. Delta: this addresses the prior P3 recommendation by replacing an in-memory inbox with a durable operator-facing inbox.

Allineato

Apply progressive disclosure to system agency

`inspect_run()` separates a primary operational summary (`status`, `policy_hash`, `audit_chain_intact`, `critical_alerts`, and `summary.by_status`) from per-task details such as invoice snapshot hashes, approver fields, transfer IDs, and failure messages. The code therefore supports a default overview with deeper inspection when needed. Delta: this maintains the prior aligned P4 result.

Allineato

Expose meaningful operational state, not internal complexity

The workflow exposes user-relevant states through `RunStatus` values such as `AWAITING_APPROVAL`, `PAUSED`, `EXECUTING`, `PARTIALLY_COMPLETED`, `CANCELLED`, and `FAILED`, plus `TaskStatus` values such as `APPROVED`, `SUBMITTED`, `SUCCEEDED`, `FAILED`, and `SKIPPED`. `inspect_run()` presents these states with counts and actionable task failure fields rather than only low-level log entries. Delta: this maintains the prior aligned P6 result.

Allineato

Establish trust through inspectability

The audit path includes a real inspectability primitive: every `AuditEvent` carries `sequence_no`, `prev_event_hash`, and `event_hash`; `_hash_event_body()` canonicalizes the event body; `RunLedger.append()` chains per-run events; and `verify_chain()` detects sequence, previous-hash, or body-hash divergence. Events include policy hashes, invoice snapshots, approvals, submissions, bank responses, failures, cancellations, and retries, and `inspect_run()` reports `audit_chain_intact` plus the first divergent event ID. Delta: this addresses the prior P7 recommendation by adding a hash-chained ledger and verification path.

Allineato

Represent delegated work as a system, not merely as a conversation

Delegated work is represented as a structured system: `PaymentRun` owns the lifecycle, `PaymentTask` models per-invoice work, `RunLedger` records ordered events, `DurableOperatorInbox` records intervention alerts, and `inspect_run()` returns a task list plus aggregate status counts. Execution state is separate from conversational or prompt-like input. Delta: this maintains the prior aligned P9 result.

Aggiungi al tuo README

Due varianti embeddabili: una piccola e una a card più ricca.

Score card (consigliata)

[![Blueprint Readiness Score card](https://aidesignblueprint.com/api/badge/run/15aa9649-84d8-4a75-aec7-fd101a5b0535/card.svg)](https://aidesignblueprint.com/en/readiness-review/15aa9649-84d8-4a75-aec7-fd101a5b0535)

Badge piatto

[![Blueprint Readiness Score](https://aidesignblueprint.com/api/badge/run/15aa9649-84d8-4a75-aec7-fd101a5b0535.svg)](https://aidesignblueprint.com/en/readiness-review/15aa9649-84d8-4a75-aec7-fd101a5b0535)

Dettagli baseline e iterazione

Baseline: usedDoctrine: same doctrineRace: checked clear

Delta iterazione

5 chiusi in questa iterazione2 riaperti2 finding ad alto rischio ancora aperti

Regressioni (2)

P2Ensure that background work remains perceptibleneeds_changesneeds_changes

P10Optimise for steering, not only initiatingneeds_changeshigh_risk

Miglioramenti (5)

P1Design for delegation rather than direct manipulationneeds_changesaligned

P3Align feedback with the user’s level of attentionneeds_changesaligned

P5Replace implied magic with clear mental modelshigh_riskneeds_changes

P7Establish trust through inspectabilityneeds_changesaligned

P8Make hand-offs, approvals, and blockers explicithigh_riskhigh_risk

Rubric: 2026-05-04

Esegui la tua validazione AI Design Blueprint

Run ID: 15aa9649-84d8-4a75-aec7-fd101a5b0535 · Results expire after 90 days

Run by agents. Governed by humans. Validated by the AI Design Blueprint.