Governed

Alignment confirmed with the doctrine.

Agent Architecture Review, Validation snapshot

Evaluated 10 May 2026 against the AI Design Blueprint doctrine

Emerging

Status: Aligned

74/100

Grade C

10 aligned

Blueprint Readiness measures doctrine alignment, not runtime correctness. A production-ready verdict means the architecture embodies the 10 principles; it does not run your tests or types. Layer it on top of your test suite, not in place of it.

Per-principle verdicts

The submission is an autonomous invoice-payment workflow with load-bearing primitives for delegation, approval gating, auditability, perceptible background state, steering, retry, and reconciliation. The iteration-6 terminal-status logic now derives completion from the whole task projection, so the previously described false-COMPLETED path is closed in the reviewed code.

Iteration history

5 prior runs on this artifact. Each run_id opens its own readiness review.

Scores can move up or down between iterations: the validator's reasoning is not strictly deterministic, so the same artifact can score differently across runs. The per-principle deltas below show the substantive change.

When	Score	Tier	Run ID
10 May 2026 (this run)	74 / C	Emerging	dd3a9348…
10 May 2026	74 / C	Emerging	b4799966…
10 May 2026	74 / C	Emerging	7e9bc0f6…
10 May 2026	60 / C	Emerging	15aa9649…
10 May 2026	30 / F	Draft	b8d61c00…
10 May 2026	0 / F	Draft	ac64d7d9…

Cert downgraded to emergingAttempt 1/3

Downgrade: the submitted bank adapter fabricates transfer and reconciliation success, and `_submit_one()` records tasks as `SUCCEEDED`, so a production user could be told invoices were paid when no external bank call occurred.

The downgrade is by design. The cert reviewer is an adversarial, independent, deliberately stricter second pass. It surfaced a production_blocker the first pass missed. Address the finding below, re-validate to get a fresh run_id, then re-certify for the badge.

Blockers the first-pass missed

needs changesproduction blocker10/100

Bank payment execution is stubbed but recorded as successful

`transfer_funds()` only checks that `BANK_API_KEY` exists and then immediately returns a fabricated response with `transfer_id: f"BANK-{idempotency_key}"` and `status: "submitted"`; it never uses the API key to call a bank or otherwise performs an external transfer. `_submit_one()` treats any returned dict as a completed payment by setting `task.status = TaskStatus.SUCCEEDED` and appending `task.succeeded`. The same pattern exists in `check_transfer_status()`, which returns `found: True` for any idempotency key, causing `reconcile_submitted_task()` to mark unknown submissions as `SUCCEEDED`.

Recommendation

Fail closed unless a real bank client confirms the transfer state. Make mock mode explicit and non-production, use the API key in an actual external call, validate the returned bank status, and keep tasks in `SUBMITTED`/`SUBMITTED_UNKNOWN` or `FAILED` rather than `SUCCEEDED` until genuine bank confirmation is received.

Per-principle findings

10 principles evaluated. Verdict, severity, evidence and recommendation for each.

aligned

Design for delegation rather than direct manipulation

Delegation is modeled explicitly through `Policy`, `create_run()`, `draft_tasks()`, and `execute_approved_payments()`: the user sets constraints such as caps, allowlists, due windows, and required approver role, while the workflow classifies invoices and executes only within that delegated scope. Authority is bounded by `policy_hash`, `invoice_snapshot_hash`, and per-task statuses rather than hidden prompt-like behavior.

aligned

Ensure that background work remains perceptible

Background work remains perceptible through durable `RunLedger` events, `DurableOperatorInbox` alerts, `RunStatus` / `TaskStatus`, and `inspect_run()`, which exposes current status, task counts, critical alerts, reconciliation needs, and audit-chain health. `replay_run_from_ledger()` preserves continuity after process interruption and promotes unresolved submitted tasks to `SUBMITTED_UNKNOWN`.

aligned

Align feedback with the user’s level of attention

Feedback is calibrated by separating routine audit events from attention-demanding inbox alerts: normal transitions are appended to `RunLedger`, while failures, forged approvals, insufficient roles, unfinished tasks, and reconciliation problems are posted via `DurableOperatorInbox.post()` with warning or critical severity. `inspect_run()` surfaces only `critical_alerts` in the primary view while retaining detailed events for inspection.

aligned

Apply progressive disclosure to system agency

The primary inspection surface in `inspect_run()` gives a concise summary first: run status, policy hash, cancellation/pause state, audit-chain integrity, critical alerts, status counts, submitted amount, and reconciliation count. Deeper detail remains available through the per-task list and the ledger’s `read_run_events()` / `verify_chain()` paths, so agency is disclosed progressively rather than as an opaque black box or raw log dump only.

aligned

Replace implied magic with clear mental models

The code uses explicit mental-model primitives: `RunStatus` distinguishes pending, drafting, awaiting approval, paused, executing, completed, partial, cancelled, and failed; `TaskStatus` distinguishes awaiting approval, approved, submitted, submitted unknown, succeeded, failed, skipped, and cancelled. `classify_invoice()` records concrete `policy_decision` reasons, and `ApprovalEnvelope` makes clear that payment execution requires a signed authorization bound to run, task, policy, invoice snapshot, role, and expiry.

aligned

Expose meaningful operational state, not internal complexity

Operational state is exposed in user-relevant terms through `inspect_run()` fields such as `status`, `critical_alerts`, `summary.by_status`, `submitted_amount_pence`, and `needs_reconciliation`. Low-level mechanics like the hash chain remain available as `audit_chain_intact` / `audit_chain_first_divergence_event_id` and ledger events, while the primary status model stays oriented around approval, execution, blocking, reconciliation, and completion.

aligned

Establish trust through inspectability

Inspectability is supported by architectural primitives rather than wrappers: `RunLedger` writes append-only JSONL events, `_hash_event_body()` creates a per-run hash chain, `verify_chain()` detects tampering, and `replay_run_from_ledger()` reconstructs state. Decisions and actions are traceable via `policy_hash`, invoice `snapshot_hash()`, approval envelope signatures, bank responses, transfer IDs, task status transitions, and actor IDs in audit events.

aligned

Make hand-offs, approvals, and blockers explicit

Approvals and blockers are explicit and load-bearing: `approve()` verifies the HMAC-signed `ApprovalEnvelope`, expiry, policy hash, invoice snapshot hash, task status, terminal-run state, and required approver role before setting `TaskStatus.APPROVED`. Execution blockers are surfaced through inbox alerts for forged envelopes, expired or insufficient approvals, policy violations, bank failures, unfinished tasks, and `SUBMITTED_UNKNOWN`; the iteration-6 whole-projection counts (`failed_total`, `succeeded_total`, `actionable_total`) prevent `COMPLETED` while any failed or actionable task remains.

aligned

Represent delegated work as a system, not merely as a conversation

Delegated work is represented as a structured system using `PaymentRun`, `PaymentTask`, `AuditEvent`, `RunLedger`, and `DurableOperatorInbox`, all keyed by `run_id` and `task_id`. The workflow separates execution state from narrative output and supports multi-step dependencies across drafting, approval, submission, reconciliation, retry, pause, cancel, and terminal resolution.

P10

aligned

Optimise for steering, not only initiating

The workflow supports steering after initiation through `pause_run()`, `resume_run()`, `cancel_run()`, `retry_failed_task()`, `resume_after_failure()`, and `reconcile_submitted_task()`. Most importantly, `execute_approved_payments()` checks `ledger.latest_steering_intent(run_id)` before each external submit, so pause or cancellation requests recorded in the ledger can interrupt execution before additional bank calls are made.

Embed in your README

Two embeddable variants: a small flat shield and a richer score card.

Score card (recommended)

[![Blueprint Readiness Score card](https://aidesignblueprint.com/api/badge/run/dd3a9348-7c1b-488e-930a-7f77d433aa6c/card.svg)](https://aidesignblueprint.com/en/readiness-review/dd3a9348-7c1b-488e-930a-7f77d433aa6c)

Flat badge

[![Blueprint Readiness Score](https://aidesignblueprint.com/api/badge/run/dd3a9348-7c1b-488e-930a-7f77d433aa6c.svg)](https://aidesignblueprint.com/en/readiness-review/dd3a9348-7c1b-488e-930a-7f77d433aa6c)

Baseline and iteration details

Baseline: usedDoctrine: same doctrineRace: checked clear

Rubric: 2026-05-04

Run your own validation AI Design Blueprint

Run ID: dd3a9348-7c1b-488e-930a-7f77d433aa6c · Results expire after 90 days

Run by agents. Governed by humans. Validated by the AI Design Blueprint.