Governed

Alignment confirmed with the doctrine.

Agent Architecture Review, Validation snapshot

Evaluated 10 May 2026 against the AI Design Blueprint doctrine

Emerging

Status: Aligned

74/100

Grade C

10 aligned

Blueprint Readiness measures doctrine alignment, not runtime correctness. A production-ready verdict means the architecture embodies the 10 principles; it does not run your tests or types. Layer it on top of your test suite, not in place of it.

Per-principle verdicts

The submission is an autonomous invoice-payment workflow with explicit run/task lifecycle, durable audit and alert surfaces, signed approval gates, idempotent bank submission, reconciliation for uncertain external side effects, and steering controls. The iteration materially addresses the prior high-risk gaps around approval authority, submitted-unknown recovery, and mid-run interruption.

Iteration history

3 prior runs on this artifact. Each run_id opens its own readiness review.

Scores can move up or down between iterations: the validator's reasoning is not strictly deterministic, so the same artifact can score differently across runs. The per-principle deltas below show the substantive change.

When	Score	Tier	Run ID
10 May 2026 (this run)	74 / C	Emerging	7e9bc0f6…
10 May 2026	60 / C	Emerging	15aa9649…
10 May 2026	30 / F	Draft	b8d61c00…
10 May 2026	0 / F	Draft	ac64d7d9…

Cert downgraded to emergingAttempt 1/3

Downgrade: `execute_approved_payments()` can mark a run `COMPLETED` while payment tasks are still `AWAITING_APPROVAL`, leaving invoices unpaid and later unexecutable without any blocker alert.

The downgrade is by design. The cert reviewer is an adversarial, independent, deliberately stricter second pass. It surfaced a production_blocker the first pass missed. Address the finding below, re-validate to get a fresh run_id, then re-certify for the badge.

Blockers the first-pass missed

needs changesproduction blocker88/100

Execution can complete a run while approvals are still pending

In `execute_approved_payments()`, the loop silently skips every task whose status is not `TaskStatus.APPROVED` via `if task.status != TaskStatus.APPROVED: continue`. After that, if there were no failures, cancellation, or pause, the final status logic unconditionally sets `run.status = RunStatus.COMPLETED`. A normal path is: `draft_tasks()` creates eligible tasks as `TaskStatus.AWAITING_APPROVAL` and sets the run to `RunStatus.AWAITING_APPROVAL`; calling `execute_approved_payments()` before all tasks are approved skips those pending tasks and records the run as completed. `approve()` also does not reject approvals on a completed run, so a task can later become `APPROVED`, but `execute_approv…

Recommendation

Before setting a run to `COMPLETED`, require that no actionable tasks remain in `AWAITING_APPROVAL` or `APPROVED`; if approvals are still pending, keep the run in `AWAITING_APPROVAL` or a clear blocked/partially-executed state and surface the required approval action. Also prevent `approve()` from approving tasks on terminal runs unless the run is explicitly reopened.

Per-principle findings

10 principles evaluated. Verdict, severity, evidence and recommendation for each.

aligned

Design for delegation rather than direct manipulation

`PaymentWorkflow.create_run()` accepts an operator-owned `Policy`, `draft_tasks()` classifies invoices against explicit constraints, and `execute_approved_payments()` only submits tasks that moved through `AWAITING_APPROVAL` to `APPROVED`. The workflow exposes delegation controls through `pause_run()`, `resume_run()`, `cancel_run()`, `retry_failed_task()`, and `resume_after_failure()` rather than requiring manual execution of every invoice step.

aligned

Ensure that background work remains perceptible

Background continuity is represented through durable JSONL primitives: `RunLedger` records `run.created`, task status events, `run.executing`, and terminal run events; `DurableOperatorInbox` persists intervention alerts; `inspect_run()` exposes current status, task counts, critical alerts, and `needs_reconciliation`. The prior P2 gap is improved by `TaskStatus.SUBMITTED_UNKNOWN`, `replay_run_from_ledger()`, and `reconcile_submitted_task()`, which make crash-after-submit states visible instead of silently losing continuity.

aligned

Align feedback with the user’s level of attention

The code separates routine status from attention-demanding conditions: normal progress is summarized by `inspect_run()['summary']`, while material intervention paths post alerts via `DurableOperatorInbox.post()` for forged approvals, expired envelopes, insufficient roles, policy violations, bank submission failures, and `SUBMITTED_UNKNOWN` reconciliation. Severity is calibrated with values such as `critical` for `auth_missing` and reconciliation blockers, and `warning` for non-fatal bank submission failures.

aligned

Apply progressive disclosure to system agency

`inspect_run()` provides a layered view: top-level run status, owner, policy hash, pause/cancel flags, audit-chain health, critical alerts, and aggregate task counts are available first, while per-task details such as invoice snapshot hash, approver identity, transfer id, failure class, and idempotency key are available in the nested `tasks` list. Deeper forensic detail remains in `RunLedger.read_run_events()` and `verify_chain()` rather than being forced into the default summary.

aligned

Replace implied magic with clear mental models

The system’s mental model is explicit: `Policy` names amount caps, allow/block lists, due window, and required approver role; `classify_invoice()` returns concrete policy decisions such as `vendor_blocked` or `outside_due_window`; and `ApprovalEnvelope` binds approval to `run_id`, `task_id`, `policy_hash`, and `invoice_snapshot_hash`. The prior P5 issue is improved because `approve()` no longer trusts raw approver strings; it verifies an HMAC envelope before recording approval authority.

aligned

Expose meaningful operational state, not internal complexity

`RunStatus` and `TaskStatus` use user-relevant operational states such as `AWAITING_APPROVAL`, `PAUSED`, `EXECUTING`, `PARTIALLY_COMPLETED`, `FAILED`, `SUBMITTED_UNKNOWN`, `SUCCEEDED`, and `CANCELLED`. `inspect_run()` translates these into actionable summaries, including counts by task status, submitted amount, critical alerts, and reconciliation count, rather than exposing only low-level ledger mechanics.

aligned

Establish trust through inspectability

The inspectability primitive is load-bearing: `RunLedger.append()` creates an append-only hash chain using `prev_event_hash` and `event_hash`, while `verify_chain()` detects sequence or hash divergence. Task events persist invoice snapshots, `invoice_snapshot_hash`, policy hash, idempotency key, approval signature, bank responses, and failure classes, allowing reviewers to reconstruct how a payment decision and submission occurred.

aligned

Make hand-offs, approvals, and blockers explicit

Approval and blocker boundaries are explicit. `approve()` verifies `ApprovalEnvelope` with `hmac.compare_digest()` before trusting role or identity, checks expiry, policy hash, invoice snapshot hash, and required role, and records approved tasks only after those gates pass. Execution cannot submit unless a task is `APPROVED`; bank auth failures, policy violations, invalid signatures, insufficient roles, and reconciliation blockers are surfaced through ledger events and/or durable inbox alerts. The prior high-risk P8 approval-forgery gap is addressed by the signed envelope primitive.

aligned

Represent delegated work as a system, not merely as a conversation

Delegated work is represented as a structured system: `PaymentRun` owns run-level state and policy hash, `PaymentTask` owns per-invoice lifecycle and idempotency key, `RunLedger` records timeline/history, and `DurableOperatorInbox` records intervention-required alerts. Conversation or prompt text is not used as the execution state; the workflow is governed by typed records, enums, policy checks, and ledger events.

P10

aligned

Optimise for steering, not only initiating

The workflow supports steering after initiation through `pause_run()`, `resume_run()`, `cancel_run()`, `retry_failed_task()`, `resume_after_failure()`, and `reconcile_submitted_task()`. `execute_approved_payments()` re-reads `RunLedger.latest_steering_intent()` before each external `transfer_funds()` call, so pause/cancel intents can interrupt a running executor before the next irreversible submit. The prior high-risk P10 gap is improved by durable steering events, same-key retry, and explicit `SUBMITTED_UNKNOWN` reconciliation.

Embed in your README

Two embeddable variants: a small flat shield and a richer score card.

Score card (recommended)

[![Blueprint Readiness Score card](https://aidesignblueprint.com/api/badge/run/7e9bc0f6-8fd3-4586-851f-d26b720ac767/card.svg)](https://aidesignblueprint.com/en/readiness-review/7e9bc0f6-8fd3-4586-851f-d26b720ac767)

Flat badge

[![Blueprint Readiness Score](https://aidesignblueprint.com/api/badge/run/7e9bc0f6-8fd3-4586-851f-d26b720ac767.svg)](https://aidesignblueprint.com/en/readiness-review/7e9bc0f6-8fd3-4586-851f-d26b720ac767)

Baseline and iteration details

Baseline: usedDoctrine: same doctrineRace: checked clear

Iteration delta

4 closed this pass0 reopened0 high-risk findings still open

Improvements (4)

P2Ensure that background work remains perceptibleneeds_changesaligned

P5Replace implied magic with clear mental modelsneeds_changesaligned

P8Make hand-offs, approvals, and blockers explicithigh_riskaligned

P10Optimise for steering, not only initiatinghigh_riskaligned

Rubric: 2026-05-04

Run your own validation AI Design Blueprint

Run ID: 7e9bc0f6-8fd3-4586-851f-d26b720ac767 · Results expire after 90 days

Run by agents. Governed by humans. Validated by the AI Design Blueprint.