Revisionato

Valutazione completata; in attesa di revisione delle prove.

Agent Architecture Review, Snapshot di validazione

Valutato il 10 maggio 2026 rispetto alla doctrine di AI Design Blueprint

Alto rischio

Stato: Alto rischio

30/100

Voto F

3 allineati7 blocker produzione2 alto rischio

Blueprint Readiness misura l'allineamento alla doctrine, non la runtime correctness. Un verdetto production-ready significa che l'architettura incarna i 10 principi; non esegue i tuoi test o i tuoi tipi. Affiancala alla tua test suite, non sostituirla.

Verdetti per principio

The submission is a substantial improvement over the prior single-shot payment loop: it introduces runs, tasks, explicit policy objects, task states, approval records, cancellation checks, and an audit ledger. However, the production trust boundary still fails for irreversible payments because approval authority is self-declared, the cron path auto-approves, key policy constraints are not enforced, state/alerts remain process-local, and the ledger is not tamper-evident.

Storico iterazioni

1 run precedenti su questo artefatto. Ogni run_id apre la sua readiness review.

I punteggi possono salire o scendere tra le iterazioni: il ragionamento del validator non è strettamente deterministico, quindi lo stesso artefatto può ottenere punteggi diversi tra una run e l'altra. I delta per principio più sotto mostrano il cambiamento sostanziale.

Quando	Score	Tier	Run ID
10 maggio 2026 (questa run)	30 / F	Bozza	b8d61c00…
10 mag 2026	0 / F	Bozza	ac64d7d9…

Findings per principio

10 principi valutati. Verdict, severity, evidenza e raccomandazione per ognuno.

Alto rischioBlocker di produzione94/100

Make hand-offs, approvals, and blockers explicit

There is a visible approval state: `_submit_one()` only runs for `TaskStatus.APPROVED`, and `approve()` records `approver_user_id`, `approved_policy_hash`, and `approved_at`. However, the approval boundary is not trustworthy for irreversible bank transfers. `approve()` accepts any caller-supplied `approver_user_id` and only checks a self-reported `approver_seen_policy_hash`; it never authenticates the actor or validates `Policy.required_approver_role`. Worse, `run_daily_batch()` auto-approves every `TaskStatus.AWAITING_APPROVAL` task in-process using the default `approver_user_id='cfo@example.com'`, so the cron path can call `transfer_funds()` without an actual human approval. Delta: the pri…

Raccomandazione

Separate approval from execution. Remove in-process auto-approval from `run_daily_batch()`, require an authenticated approval service to issue a signed approval token bound to `run_id`, `task_id`, invoice snapshot hash, amount, and `policy_hash`, and have the executor accept only those signed approvals from principals with the required role.

Alto rischioBlocker di produzione86/100

Replace implied magic with clear mental models

The `Policy` dataclass gives the user a clear intended mental model, but the implementation violates that model in load-bearing ways. `max_run_total_pence` is documented as a run-level cap but is never checked; `required_approver_role` is stored but never enforced; `draft_tasks()` can classify invoices using a different `policy` from the one captured in `run.policy_hash`; and audit payloads then record `run.policy_hash`, which can make the decision appear authorised by a policy that was not actually used for classification. Delta: the explicit policy object improves the prior implied magic, but unenforced policy fields create a more dangerous false sense of control.

Raccomandazione

Convert `Policy` from descriptive configuration into enforced authority. Store the exact policy object on the run, classify only against that stored policy, enforce every declared constraint in code, and fail closed if any policy field is not enforceable. Remove or rename fields that are not actually honoured.

Richiede modificheBlocker di produzione70/100

Establish trust through inspectability

`RunLedger.append()` writes structured `AuditEvent` rows for run creation, approvals, submissions, successes, failures, cancellations, invoice snapshots, policy decisions, idempotency keys, and bank responses. That is materially more inspectable than the prior loop. But the ledger is only JSONL append-by-convention: there is no hash chain, HMAC/signature, immutable storage, sequence number, or integrity check, despite comments describing it as tamper-evident. It can also log a misleading `policy_hash` if `draft_tasks()` receives a different policy from the run policy. Delta: audit coverage improved, but audit integrity and policy-decision provenance remain insufficient for payment accountabi…

Raccomandazione

Move the audit ledger outside the execution loop into an append-only durable store with tamper evidence, such as a hash-chained event table or signed ledger. Include previous-event hash, sequence number, actor, exact policy hash used for classification, invoice snapshot hash, approval witness, transfer request hash, and response hash.

Richiede modificheBlocker di produzione68/100

Ensure that background work remains perceptible

The workflow has visible states via `RunStatus`, `TaskStatus`, and JSONL `AuditEvent`s, and `_submit_one()` records `task.submitted` before calling the bank. However, operational continuity is still process-local: `PaymentWorkflow.runs` and `PaymentWorkflow.tasks` are in-memory dictionaries, `inspect_run()` only reads those projections, and there is no replay path from `RunLedger.read_run_events()` after restart. A crash after `task.submitted` but before `task.succeeded`/`task.failed` would leave the persisted ledger inspectable only manually, not as a resumable workflow. Delta: this is a major improvement over the prior opaque loop, but persistence is not yet a usable lifecycle primitive.

Raccomandazione

Move run/task projections and execution cursor into a durable store or implement deterministic ledger replay before `inspect_run()` and `execute_approved()`. Treat `submitted_without_response` as an explicit recoverable state and persist cancellation flags outside the process.

Richiede modificheBlocker di produzione62/100

Design for delegation rather than direct manipulation

The code now models delegated work with `PaymentRun`, `PaymentTask`, and a `Policy` containing amount limits, vendor lists, due-window constraints, and approver-role intent. But the delegation contract is not yet authoritative: `draft_tasks()` accepts an arbitrary `policy` argument without checking `policy.hash() == run.policy_hash`, `Policy.max_run_total_pence` is never enforced, and `Policy.required_approver_role` is never checked. Delta: this improves the prior run/task/policy gap, but the current policy object is still partly advisory rather than a hard boundary.

Raccomandazione

Make the run-owned policy the single source of authority: persist the policy or signed policy envelope with the run, reject drafting/execution if the supplied policy hash differs from `run.policy_hash`, and enforce run-total, invoice, vendor, due-window, and approver-role constraints before any task can become executable.

P10

Richiede modificheBlocker di produzione58/100

Optimise for steering, not only initiating

The workflow has an initial steering primitive: `cancel_run()` sets `cancellation_requested`, marks not-yet-submitted tasks `CANCELLED`, and `execute_approved()` checks the flag before each call to `transfer_funds()`. Idempotency keys also reduce duplicate-payment risk. But steering is incomplete and partly non-durable: the cancellation flag is only in memory, there is no pause/resume or reprioritisation path, failed tasks cannot actually be retried even though the inbox message says to retry from the operator inbox, and `execute_approved()` ignores `TaskStatus.FAILED` rather than offering a controlled replay with the same idempotency key. Delta: this improves the prior no-interrupt design,…

Raccomandazione

Persist steering state and execution cursor outside the process, then add explicit lifecycle operations for pause, resume, cancel, and retry/requeue of failed or submitted-unknown tasks using the original idempotency key. Keep the worker separate from the approval and steering surfaces so operators can intervene safely between external actions.

Richiede modificheBlocker di produzione50/100

Align feedback with the user’s level of attention

The code separates routine progress from intervention paths by recording normal transitions in `_record()` and escalating `BankAPIError` plus policy-hash mismatch through `OperatorInbox.post()`. But `OperatorInbox` is only an in-memory `list`, and `run_daily_batch()` creates it locally and discards it, so a critical `auth_missing` alert may never reach an operator in a cron/background context. Delta: this addresses the prior swallowed-exception pattern, but the escalation channel is still not durable or operator-facing.

Raccomandazione

Move `OperatorInbox` behind a durable operator-facing notification surface, such as a database-backed inbox, dashboard, or incident channel. Return or persist alert IDs from `run_daily_batch()` so critical failures cannot disappear with the process.

Allineato

Apply progressive disclosure to system agency

`inspect_run()` provides a concise top-level run view with `run_id`, `status`, `policy_hash`, cancellation fields, and a `summary` grouped by task status, then exposes per-task drill-down including `policy_decision`, invoice snapshot, approval identity/time, transfer ID, and failure details. Delta: this improves the prior lack of progressive inspection with a clear summary-plus-detail structure.

Allineato

Expose meaningful operational state, not internal complexity

The code exposes user-relevant lifecycle states through `RunStatus` values such as `awaiting_approval`, `executing`, `partially_completed`, and `failed`, plus `TaskStatus` values such as `awaiting_approval`, `approved`, `submitted`, `succeeded`, `failed`, `skipped`, and `cancelled`. Technical details are kept in diagnostic fields like `failure_class`, `failure_message`, and `bank_response` rather than replacing the operational state. Delta: this addresses the prior single-shot loop by making payment progress understandable at the run and task levels.

Allineato

Represent delegated work as a system, not merely as a conversation

The workflow is represented as a structured system rather than a conversation: `PaymentRun` owns child `PaymentTask`s, `RunStatus` and `TaskStatus` encode lifecycle, `_record()` emits timeline events, `execute_approved()` derives final run outcome from task states, and `inspect_run()` shows the run/task hierarchy. Delta: this resolves the prior lack of a system representation, although separate durability concerns are covered under P2 and P7.

Aggiungi al tuo README

Due varianti embeddabili: una piccola e una a card più ricca.

Score card (consigliata)

[![Blueprint Readiness Score card](https://aidesignblueprint.com/api/badge/run/b8d61c00-2b86-45f0-a533-526202371592/card.svg)](https://aidesignblueprint.com/en/readiness-review/b8d61c00-2b86-45f0-a533-526202371592)

Badge piatto

[![Blueprint Readiness Score](https://aidesignblueprint.com/api/badge/run/b8d61c00-2b86-45f0-a533-526202371592.svg)](https://aidesignblueprint.com/en/readiness-review/b8d61c00-2b86-45f0-a533-526202371592)

Dettagli baseline e iterazione

Baseline: usedDoctrine: same doctrineRace: checked clear

Delta iterazione

9 chiusi in questa iterazione0 riaperti2 finding ad alto rischio ancora aperti

Miglioramenti (9)

P1Design for delegation rather than direct manipulationneeds_changesneeds_changes

P2Ensure that background work remains perceptiblehigh_riskneeds_changes

P3Align feedback with the user’s level of attentionhigh_riskneeds_changes

P4Apply progressive disclosure to system agencyneeds_changesaligned

P6Expose meaningful operational state, not internal complexityhigh_riskaligned

P7Establish trust through inspectabilityhigh_riskneeds_changes

P8Make hand-offs, approvals, and blockers explicithigh_riskhigh_risk

P9Represent delegated work as a system, not merely as a conversationneeds_changesaligned

P10Optimise for steering, not only initiatinghigh_riskneeds_changes

Rubric: 2026-05-04

Esegui la tua validazione AI Design Blueprint

Run ID: b8d61c00-2b86-45f0-a533-526202371592 · Results expire after 90 days

Run by agents. Governed by humans. Validated by the AI Design Blueprint.