Vai al contenuto principaleVai al footer
Governato

Allineamento confermato con la doctrine.

Agent Architecture Review, Snapshot di validazione

Valutato il 10 maggio 2026 rispetto alla doctrine di AI Design Blueprint

Emergente

Stato: Allineato

74/100

Voto C

10 allineati

Blueprint Readiness misura l'allineamento alla doctrine, non la runtime correctness. Un verdetto production-ready significa che l'architettura incarna i 10 principi; non esegue i tuoi test o i tuoi tipi. Affiancala alla tua test suite, non sostituirla.

Verdetti per principio

The submission is an autonomous invoice-payment workflow with load-bearing primitives for delegation, approval gating, auditability, perceptible background state, steering, retry, and reconciliation. The iteration-6 terminal-status logic now derives completion from the whole task projection, so the previously described false-COMPLETED path is closed in the reviewed code.

Storico iterazioni

5 run precedenti su questo artefatto. Ogni run_id apre la sua readiness review.

I punteggi possono salire o scendere tra le iterazioni: il ragionamento del validator non è strettamente deterministico, quindi lo stesso artefatto può ottenere punteggi diversi tra una run e l'altra. I delta per principio più sotto mostrano il cambiamento sostanziale.

QuandoScoreTierRun ID
10 maggio 2026 (questa run)74 / CIn sviluppodd3a9348
10 mag 202674 / CIn sviluppob4799966
10 mag 202674 / CIn sviluppo7e9bc0f6
10 mag 202660 / CIn sviluppo15aa9649
10 mag 202630 / FBozzab8d61c00
10 mag 20260 / FBozzaac64d7d9
Cert downgraded to emergingTentativo 1/3

Downgrade: the submitted bank adapter fabricates transfer and reconciliation success, and `_submit_one()` records tasks as `SUCCEEDED`, so a production user could be told invoices were paid when no external bank call occurred.

Il downgrade è di proposito. Il cert agent è un secondo revisore avversario, indipendente, deliberatamente più severo del first-pass. Ha trovato un production_blocker che il first-pass aveva mancato. Risolvi il finding qui sotto, ri-valida per ottenere un nuovo run_id, e ricertifica per il badge.

Blocker mancati dal first-pass

P5

Richiede modificheBlocker di produzione10/100

Bank payment execution is stubbed but recorded as successful

`transfer_funds()` only checks that `BANK_API_KEY` exists and then immediately returns a fabricated response with `transfer_id: f"BANK-{idempotency_key}"` and `status: "submitted"`; it never uses the API key to call a bank or otherwise performs an external transfer. `_submit_one()` treats any returned dict as a completed payment by setting `task.status = TaskStatus.SUCCEEDED` and appending `task.succeeded`. The same pattern exists in `check_transfer_status()`, which returns `found: True` for any idempotency key, causing `reconcile_submitted_task()` to mark unknown submissions as `SUCCEEDED`.

Raccomandazione

Fail closed unless a real bank client confirms the transfer state. Make mock mode explicit and non-production, use the API key in an actual external call, validate the returned bank status, and keep tasks in `SUBMITTED`/`SUBMITTED_UNKNOWN` or `FAILED` rather than `SUCCEEDED` until genuine bank confirmation is received.

Findings per principio

10 principi valutati. Verdict, severity, evidenza e raccomandazione per ognuno.

P1

Allineato

Design for delegation rather than direct manipulation

Delegation is modeled explicitly through `Policy`, `create_run()`, `draft_tasks()`, and `execute_approved_payments()`: the user sets constraints such as caps, allowlists, due windows, and required approver role, while the workflow classifies invoices and executes only within that delegated scope. Authority is bounded by `policy_hash`, `invoice_snapshot_hash`, and per-task statuses rather than hidden prompt-like behavior.

P2

Allineato

Ensure that background work remains perceptible

Background work remains perceptible through durable `RunLedger` events, `DurableOperatorInbox` alerts, `RunStatus` / `TaskStatus`, and `inspect_run()`, which exposes current status, task counts, critical alerts, reconciliation needs, and audit-chain health. `replay_run_from_ledger()` preserves continuity after process interruption and promotes unresolved submitted tasks to `SUBMITTED_UNKNOWN`.

P?

Allineato

Align feedback with the user’s level of attention

Feedback is calibrated by separating routine audit events from attention-demanding inbox alerts: normal transitions are appended to `RunLedger`, while failures, forged approvals, insufficient roles, unfinished tasks, and reconciliation problems are posted via `DurableOperatorInbox.post()` with warning or critical severity. `inspect_run()` surfaces only `critical_alerts` in the primary view while retaining detailed events for inspection.

P4

Allineato

Apply progressive disclosure to system agency

The primary inspection surface in `inspect_run()` gives a concise summary first: run status, policy hash, cancellation/pause state, audit-chain integrity, critical alerts, status counts, submitted amount, and reconciliation count. Deeper detail remains available through the per-task list and the ledger’s `read_run_events()` / `verify_chain()` paths, so agency is disclosed progressively rather than as an opaque black box or raw log dump only.

P5

Allineato

Replace implied magic with clear mental models

The code uses explicit mental-model primitives: `RunStatus` distinguishes pending, drafting, awaiting approval, paused, executing, completed, partial, cancelled, and failed; `TaskStatus` distinguishes awaiting approval, approved, submitted, submitted unknown, succeeded, failed, skipped, and cancelled. `classify_invoice()` records concrete `policy_decision` reasons, and `ApprovalEnvelope` makes clear that payment execution requires a signed authorization bound to run, task, policy, invoice snapshot, role, and expiry.

P6

Allineato

Expose meaningful operational state, not internal complexity

Operational state is exposed in user-relevant terms through `inspect_run()` fields such as `status`, `critical_alerts`, `summary.by_status`, `submitted_amount_pence`, and `needs_reconciliation`. Low-level mechanics like the hash chain remain available as `audit_chain_intact` / `audit_chain_first_divergence_event_id` and ledger events, while the primary status model stays oriented around approval, execution, blocking, reconciliation, and completion.

P7

Allineato

Establish trust through inspectability

Inspectability is supported by architectural primitives rather than wrappers: `RunLedger` writes append-only JSONL events, `_hash_event_body()` creates a per-run hash chain, `verify_chain()` detects tampering, and `replay_run_from_ledger()` reconstructs state. Decisions and actions are traceable via `policy_hash`, invoice `snapshot_hash()`, approval envelope signatures, bank responses, transfer IDs, task status transitions, and actor IDs in audit events.

P8

Allineato

Make hand-offs, approvals, and blockers explicit

Approvals and blockers are explicit and load-bearing: `approve()` verifies the HMAC-signed `ApprovalEnvelope`, expiry, policy hash, invoice snapshot hash, task status, terminal-run state, and required approver role before setting `TaskStatus.APPROVED`. Execution blockers are surfaced through inbox alerts for forged envelopes, expired or insufficient approvals, policy violations, bank failures, unfinished tasks, and `SUBMITTED_UNKNOWN`; the iteration-6 whole-projection counts (`failed_total`, `succeeded_total`, `actionable_total`) prevent `COMPLETED` while any failed or actionable task remains.

P9

Allineato

Represent delegated work as a system, not merely as a conversation

Delegated work is represented as a structured system using `PaymentRun`, `PaymentTask`, `AuditEvent`, `RunLedger`, and `DurableOperatorInbox`, all keyed by `run_id` and `task_id`. The workflow separates execution state from narrative output and supports multi-step dependencies across drafting, approval, submission, reconciliation, retry, pause, cancel, and terminal resolution.

P10

Allineato

Optimise for steering, not only initiating

The workflow supports steering after initiation through `pause_run()`, `resume_run()`, `cancel_run()`, `retry_failed_task()`, `resume_after_failure()`, and `reconcile_submitted_task()`. Most importantly, `execute_approved_payments()` checks `ledger.latest_steering_intent(run_id)` before each external submit, so pause or cancellation requests recorded in the ledger can interrupt execution before additional bank calls are made.

Aggiungi al tuo README

Due varianti embeddabili: una piccola e una a card più ricca.

Score card (consigliata)

Blueprint Readiness Score card
[![Blueprint Readiness Score card](https://aidesignblueprint.com/api/badge/run/dd3a9348-7c1b-488e-930a-7f77d433aa6c/card.svg)](https://aidesignblueprint.com/en/readiness-review/dd3a9348-7c1b-488e-930a-7f77d433aa6c)

Badge piatto

Blueprint Readiness Score badge
[![Blueprint Readiness Score](https://aidesignblueprint.com/api/badge/run/dd3a9348-7c1b-488e-930a-7f77d433aa6c.svg)](https://aidesignblueprint.com/en/readiness-review/dd3a9348-7c1b-488e-930a-7f77d433aa6c)
Dettagli baseline e iterazione
Baseline: usedDoctrine: same doctrineRace: checked clear
Rubric: 2026-05-04

Run ID: dd3a9348-7c1b-488e-930a-7f77d433aa6c · Results expire after 90 days

Run by agents. Governed by humans. Validated by the AI Design Blueprint.