Allineamento confermato con la doctrine.
Valutato il 10 maggio 2026 rispetto alla doctrine di AI Design Blueprint
Emergente
Stato: Allineato
74/100
Voto C
Blueprint Readiness misura l'allineamento alla doctrine, non la runtime correctness. Un verdetto production-ready significa che l'architettura incarna i 10 principi; non esegue i tuoi test o i tuoi tipi. Affiancala alla tua test suite, non sostituirla.
The submission is an autonomous invoice-payment workflow with load-bearing primitives for delegation, approval gating, auditability, perceptible background state, steering, retry, and reconciliation. The iteration-6 terminal-status logic now derives completion from the whole task projection, so the previously described false-COMPLETED path is closed in the reviewed code.
Storico iterazioni
5 run precedenti su questo artefatto. Ogni run_id apre la sua readiness review.
I punteggi possono salire o scendere tra le iterazioni: il ragionamento del validator non è strettamente deterministico, quindi lo stesso artefatto può ottenere punteggi diversi tra una run e l'altra. I delta per principio più sotto mostrano il cambiamento sostanziale.
Downgrade: the submitted bank adapter fabricates transfer and reconciliation success, and `_submit_one()` records tasks as `SUCCEEDED`, so a production user could be told invoices were paid when no external bank call occurred.
Il downgrade è di proposito. Il cert agent è un secondo revisore avversario, indipendente, deliberatamente più severo del first-pass. Ha trovato un production_blocker che il first-pass aveva mancato. Risolvi il finding qui sotto, ri-valida per ottenere un nuovo run_id, e ricertifica per il badge.
Blocker mancati dal first-pass
P5
Richiede modificheBlocker di produzione10/100Bank payment execution is stubbed but recorded as successful
`transfer_funds()` only checks that `BANK_API_KEY` exists and then immediately returns a fabricated response with `transfer_id: f"BANK-{idempotency_key}"` and `status: "submitted"`; it never uses the API key to call a bank or otherwise performs an external transfer. `_submit_one()` treats any returned dict as a completed payment by setting `task.status = TaskStatus.SUCCEEDED` and appending `task.succeeded`. The same pattern exists in `check_transfer_status()`, which returns `found: True` for any idempotency key, causing `reconcile_submitted_task()` to mark unknown submissions as `SUCCEEDED`.
Raccomandazione
Fail closed unless a real bank client confirms the transfer state. Make mock mode explicit and non-production, use the API key in an actual external call, validate the returned bank status, and keep tasks in `SUBMITTED`/`SUBMITTED_UNKNOWN` or `FAILED` rather than `SUCCEEDED` until genuine bank confirmation is received.
Findings per principio
10 principi valutati. Verdict, severity, evidenza e raccomandazione per ognuno.
P1
AllineatoDesign for delegation rather than direct manipulation
Delegation is modeled explicitly through `Policy`, `create_run()`, `draft_tasks()`, and `execute_approved_payments()`: the user sets constraints such as caps, allowlists, due windows, and required approver role, while the workflow classifies invoices and executes only within that delegated scope. Authority is bounded by `policy_hash`, `invoice_snapshot_hash`, and per-task statuses rather than hidden prompt-like behavior.
P2
AllineatoEnsure that background work remains perceptible
Background work remains perceptible through durable `RunLedger` events, `DurableOperatorInbox` alerts, `RunStatus` / `TaskStatus`, and `inspect_run()`, which exposes current status, task counts, critical alerts, reconciliation needs, and audit-chain health. `replay_run_from_ledger()` preserves continuity after process interruption and promotes unresolved submitted tasks to `SUBMITTED_UNKNOWN`.
P?
AllineatoAlign feedback with the user’s level of attention
Feedback is calibrated by separating routine audit events from attention-demanding inbox alerts: normal transitions are appended to `RunLedger`, while failures, forged approvals, insufficient roles, unfinished tasks, and reconciliation problems are posted via `DurableOperatorInbox.post()` with warning or critical severity. `inspect_run()` surfaces only `critical_alerts` in the primary view while retaining detailed events for inspection.
P4
AllineatoApply progressive disclosure to system agency
The primary inspection surface in `inspect_run()` gives a concise summary first: run status, policy hash, cancellation/pause state, audit-chain integrity, critical alerts, status counts, submitted amount, and reconciliation count. Deeper detail remains available through the per-task list and the ledger’s `read_run_events()` / `verify_chain()` paths, so agency is disclosed progressively rather than as an opaque black box or raw log dump only.
P5
AllineatoReplace implied magic with clear mental models
The code uses explicit mental-model primitives: `RunStatus` distinguishes pending, drafting, awaiting approval, paused, executing, completed, partial, cancelled, and failed; `TaskStatus` distinguishes awaiting approval, approved, submitted, submitted unknown, succeeded, failed, skipped, and cancelled. `classify_invoice()` records concrete `policy_decision` reasons, and `ApprovalEnvelope` makes clear that payment execution requires a signed authorization bound to run, task, policy, invoice snapshot, role, and expiry.
P6
AllineatoExpose meaningful operational state, not internal complexity
Operational state is exposed in user-relevant terms through `inspect_run()` fields such as `status`, `critical_alerts`, `summary.by_status`, `submitted_amount_pence`, and `needs_reconciliation`. Low-level mechanics like the hash chain remain available as `audit_chain_intact` / `audit_chain_first_divergence_event_id` and ledger events, while the primary status model stays oriented around approval, execution, blocking, reconciliation, and completion.
P7
AllineatoEstablish trust through inspectability
Inspectability is supported by architectural primitives rather than wrappers: `RunLedger` writes append-only JSONL events, `_hash_event_body()` creates a per-run hash chain, `verify_chain()` detects tampering, and `replay_run_from_ledger()` reconstructs state. Decisions and actions are traceable via `policy_hash`, invoice `snapshot_hash()`, approval envelope signatures, bank responses, transfer IDs, task status transitions, and actor IDs in audit events.
P8
AllineatoMake hand-offs, approvals, and blockers explicit
Approvals and blockers are explicit and load-bearing: `approve()` verifies the HMAC-signed `ApprovalEnvelope`, expiry, policy hash, invoice snapshot hash, task status, terminal-run state, and required approver role before setting `TaskStatus.APPROVED`. Execution blockers are surfaced through inbox alerts for forged envelopes, expired or insufficient approvals, policy violations, bank failures, unfinished tasks, and `SUBMITTED_UNKNOWN`; the iteration-6 whole-projection counts (`failed_total`, `succeeded_total`, `actionable_total`) prevent `COMPLETED` while any failed or actionable task remains.
P9
AllineatoRepresent delegated work as a system, not merely as a conversation
Delegated work is represented as a structured system using `PaymentRun`, `PaymentTask`, `AuditEvent`, `RunLedger`, and `DurableOperatorInbox`, all keyed by `run_id` and `task_id`. The workflow separates execution state from narrative output and supports multi-step dependencies across drafting, approval, submission, reconciliation, retry, pause, cancel, and terminal resolution.
P10
AllineatoOptimise for steering, not only initiating
The workflow supports steering after initiation through `pause_run()`, `resume_run()`, `cancel_run()`, `retry_failed_task()`, `resume_after_failure()`, and `reconcile_submitted_task()`. Most importantly, `execute_approved_payments()` checks `ledger.latest_steering_intent(run_id)` before each external submit, so pause or cancellation requests recorded in the ledger can interrupt execution before additional bank calls are made.
Aggiungi al tuo README
Due varianti embeddabili: una piccola e una a card più ricca.
Score card (consigliata)
[](https://aidesignblueprint.com/en/readiness-review/dd3a9348-7c1b-488e-930a-7f77d433aa6c)
Badge piatto
[](https://aidesignblueprint.com/en/readiness-review/dd3a9348-7c1b-488e-930a-7f77d433aa6c)
Run ID: dd3a9348-7c1b-488e-930a-7f77d433aa6c · Results expire after 90 days
Run by agents. Governed by humans. Validated by the AI Design Blueprint.