Vai al contenuto principaleVai al footer
Governato

Allineamento confermato con la doctrine.

Agent Architecture Review, Snapshot di validazione

Valutato il 10 maggio 2026 rispetto alla doctrine di AI Design Blueprint

Pronto per produzione

Stato: Allineato

100/100

Voto A

10 allineati

Blueprint Readiness misura l'allineamento alla doctrine, non la runtime correctness. Un verdetto production-ready significa che l'architettura incarna i 10 principi; non esegue i tuoi test o i tuoi tipi. Affiancala alla tua test suite, non sostituirla.

Verdetti per principio

The submission implements an autonomous invoice-payment workflow with load-bearing trust primitives: persistent run/task identifiers, HMAC-signed approval envelopes, policy and invoice snapshot hashes, a hash-chained audit ledger, a durable operator inbox, explicit bank-handoff states, and pause/cancel/reconcile controls. The iteration materially addresses the prior high-risk gaps around SUBMITTED_UNKNOWN durability, reconciliation-first status, explicit blockers, and cancellation while bank exposure remains.

Storico iterazioni

5 run precedenti su questo artefatto. Ogni run_id apre la sua readiness review.

I punteggi possono salire o scendere tra le iterazioni: il ragionamento del validator non è strettamente deterministico, quindi lo stesso artefatto può ottenere punteggi diversi tra una run e l'altra. I delta per principio più sotto mostrano il cambiamento sostanziale.

QuandoScoreTierRun ID
10 maggio 2026 (questa run)100 / AProduction-ready29b080f4
10 mag 202660 / CIn sviluppo039064f9
10 mag 202630 / FBozza5c1d5833
10 mag 202640 / DBozzab1195c34
10 mag 202674 / CIn sviluppodd3a9348
10 mag 202674 / CIn sviluppob4799966
Certified production-readyTentativo 1/3

Confirmed: the code visibly implements durable reconciliation blocking for ambiguous bank handoffs, explicit cancellation-pending states, signed approval/hash checks, audit/inbox inspectability, and no specific missed production-blocking crash, silent wrong-result path, or trust-boundary bypass is evidenced.

Findings per principio

10 principi valutati. Verdict, severity, evidenza e raccomandazione per ognuno.

P1

Allineato

Design for delegation rather than direct manipulation

The workflow is structured around delegated work rather than manual step execution: `create_run()` captures owner and `Policy`, `draft_tasks()` classifies invoices against constraints, `approve()` gates execution through a signed envelope, and `execute_approved_payments()` carries out approved payments. Control surfaces such as `pause_run()`, `cancel_run()`, `resume_after_failure()`, `retry_failed_task()`, and reconciliation methods make delegated authority bounded and governable.

P2

Allineato

Ensure that background work remains perceptible

Background work remains perceptible through durable state and alerts: `RunStatus` and `TaskStatus` model lifecycle, `RunLedger.append()` persists audit events to JSONL, `DurableOperatorInbox.post()` stores operator-visible alerts, and `inspect_run()` exposes status, critical alerts, audit-chain integrity, and task summaries. Users can leave and return because `replay_run_from_ledger()` reconstructs runs from persisted events.

P?

Allineato

Align feedback with the user’s level of attention

Feedback is calibrated by risk: routine transitions are recorded as ledger events such as `run.executing` and `task.approved`, while attention-demanding conditions post inbox alerts with severity, e.g. `Execution refused: unresolved bank exposure`, `Execution halted: ambiguous bank handoff`, and bank submission/reconciliation failures. `inspect_run()` filters `critical_alerts` for foreground review without forcing users through the full ledger by default.

P4

Allineato

Apply progressive disclosure to system agency

The code supports progressive disclosure: `inspect_run()` gives a primary view with run status, bank mode, critical alerts, summary counts, and per-task state, while deeper accountability remains available through `RunLedger.read_run_events()` and `verify_chain()`. Policy decisions, invoice snapshots, approval metadata, and bank responses are available for inspection without making the audit log the only interface.

P5

Allineato

Replace implied magic with clear mental models

The workflow gives users a clear mental model for what can proceed and what cannot: `classify_invoice()` explains policy outcomes, approval checks bind `policy_hash` and `invoice_snapshot_hash`, and `execute_approved_payments()` now refuses new bank calls when any task is `TaskStatus.SUBMITTED_UNKNOWN`, posting a specific required action. Delta: this improves the prior P5 finding by making the SUBMITTED_UNKNOWN blocker durable across execution calls instead of a per-pass condition.

P6

Allineato

Expose meaningful operational state, not internal complexity

Operational state is user-meaningful rather than merely internal: `RunStatus.BLOCKED_ON_RECONCILIATION` distinguishes reconciliation work from approval waiting, and `RunStatus.CANCELLATION_PENDING_RECONCILIATION` distinguishes a requested cancel from a terminal cancellation while bank exposure remains. `inspect_run()` surfaces `needs_reconciliation`, real vs simulated success counts, and status-by-task summaries. Delta: this addresses the prior P6 recommendation by letting unresolved exposure dominate `AWAITING_APPROVAL`/`CANCELLED` in the close path.

P7

Allineato

Establish trust through inspectability

Inspectability is backed by concrete audit primitives: `AuditEvent` includes sequence numbers, actor, payload, `prev_event_hash`, and `event_hash`; `_hash_event_body()` and `RunLedger.verify_chain()` make tampering detectable; approval events store approver role, policy hash, invoice snapshot hash, envelope signature, and expiry. `inspect_run()` exposes `audit_chain_intact` and the first divergent event id if verification fails.

P8

Allineato

Make hand-offs, approvals, and blockers explicit

Hand-offs, approvals, and blockers are explicit. `ApprovalEnvelope` is HMAC-signed and validated for signature, expiry, policy hash, invoice snapshot hash, task state, and required approver role before a task becomes `APPROVED`. Bank handoff ambiguity is represented as `SUBMITTED_UNKNOWN`, execution halts on ambiguous handoff, and the preflight in `execute_approved_payments()` posts `run.execution_blocked_on_unresolved_exposure` plus a critical inbox alert before returning without any new bank call. Delta: this resolves the prior P8 high-risk gap by moving the unresolved-exposure blocker to a durable run-level preflight.

P9

Allineato

Represent delegated work as a system, not merely as a conversation

Delegated work is represented as a structured system: `PaymentRun`, `PaymentTask`, `RunStatus`, `TaskStatus`, `AuditEvent`, `RunLedger`, and `DurableOperatorInbox` separate execution state, task state, audit history, and operator alerts. Multi-invoice work is modeled as per-task records under a run rather than as an unstructured conversation or message stream.

P10

Allineato

Optimise for steering, not only initiating

The workflow supports steering during execution through `pause_run()`, `resume_run()`, `cancel_run()`, `resume_after_failure()`, `retry_failed_task()`, `reconcile_submitted_task()`, and `confirm_mock_simulation()`. Cancellation is projection-aware: `cancel_run()` and the execution close path use `CANCELLATION_PENDING_RECONCILIATION` when `SUBMITTED` or `SUBMITTED_UNKNOWN` exposure remains, and `_reconcile_run_status()` transitions out only after exposure clears. Delta: this addresses the prior P10 high-risk finding by preventing terminal cancellation from hiding unresolved bank exposure.

Aggiungi al tuo README

Due varianti embeddabili: una piccola e una a card più ricca.

Score card (consigliata)

Blueprint Readiness Score card
[![Blueprint Readiness Score card](https://aidesignblueprint.com/api/badge/run/29b080f4-b9a3-439d-bb07-cf666a18300d/card.svg)](https://aidesignblueprint.com/en/readiness-review/29b080f4-b9a3-439d-bb07-cf666a18300d)

Badge piatto

Blueprint Readiness Score badge
[![Blueprint Readiness Score](https://aidesignblueprint.com/api/badge/run/29b080f4-b9a3-439d-bb07-cf666a18300d.svg)](https://aidesignblueprint.com/en/readiness-review/29b080f4-b9a3-439d-bb07-cf666a18300d)
Dettagli baseline e iterazione
Baseline: usedDoctrine: same doctrineRace: checked clear

Delta iterazione

4 chiusi in questa iterazione0 riaperti0 finding ad alto rischio ancora aperti

Miglioramenti (4)

P5Replace implied magic with clear mental modelsneeds_changesaligned
P6Expose meaningful operational state, not internal complexityneeds_changesaligned
P8Make hand-offs, approvals, and blockers explicithigh_riskaligned
P10Optimise for steering, not only initiatinghigh_riskaligned
Rubric: 2026-05-04

Run ID: 29b080f4-b9a3-439d-bb07-cf666a18300d · Results expire after 90 days

Run by agents. Governed by humans. Validated by the AI Design Blueprint.