Alignment confirmed with the doctrine.
Evaluated 10 May 2026 against the AI Design Blueprint doctrine
Emerging
Status: Aligned
74/100
Grade C
The submission is an autonomous invoice-payment workflow with explicit run/task lifecycle, durable audit and alert surfaces, signed approval gates, idempotent bank submission, reconciliation for uncertain external side effects, and steering controls. The iteration materially addresses the prior high-risk gaps around approval authority, submitted-unknown recovery, and mid-run interruption.
Iteration history
3 prior runs on this artifact. Each run_id opens its own readiness review.
Downgrade: `execute_approved_payments()` can mark a run `COMPLETED` while payment tasks are still `AWAITING_APPROVAL`, leaving invoices unpaid and later unexecutable without any blocker alert.
The downgrade is by design. The cert reviewer is an adversarial, independent, deliberately stricter second pass. It surfaced a production_blocker the first pass missed. Address the finding below, re-validate to get a fresh run_id, then re-certify for the badge.
Blockers the first-pass missed
P8
Execution can complete a run while approvals are still pending
In `execute_approved_payments()`, the loop silently skips every task whose status is not `TaskStatus.APPROVED` via `if task.status != TaskStatus.APPROVED: continue`. After that, if there were no failures, cancellation, or pause, the final status logic unconditionally sets `run.status = RunStatus.COMPLETED`. A normal path is: `draft_tasks()` creates eligible tasks as `TaskStatus.AWAITING_APPROVAL` and sets the run to `RunStatus.AWAITING_APPROVAL`; calling `execute_approved_payments()` before all tasks are approved skips those pending tasks and records the run as completed. `approve()` also does not reject approvals on a completed run, so a task can later become `APPROVED`, but `execute_approv…
Recommendation
Before setting a run to `COMPLETED`, require that no actionable tasks remain in `AWAITING_APPROVAL` or `APPROVED`; if approvals are still pending, keep the run in `AWAITING_APPROVAL` or a clear blocked/partially-executed state and surface the required approval action. Also prevent `approve()` from approving tasks on terminal runs unless the run is explicitly reopened.
Per-principle findings
10 principles evaluated. Verdict, severity, evidence and recommendation for each.
P0
Design for delegation rather than direct manipulation
`PaymentWorkflow.create_run()` accepts an operator-owned `Policy`, `draft_tasks()` classifies invoices against explicit constraints, and `execute_approved_payments()` only submits tasks that moved through `AWAITING_APPROVAL` to `APPROVED`. The workflow exposes delegation controls through `pause_run()`, `resume_run()`, `cancel_run()`, `retry_failed_task()`, and `resume_after_failure()` rather than requiring manual execution of every invoice step.
P0
Ensure that background work remains perceptible
Background continuity is represented through durable JSONL primitives: `RunLedger` records `run.created`, task status events, `run.executing`, and terminal run events; `DurableOperatorInbox` persists intervention alerts; `inspect_run()` exposes current status, task counts, critical alerts, and `needs_reconciliation`. The prior P2 gap is improved by `TaskStatus.SUBMITTED_UNKNOWN`, `replay_run_from_ledger()`, and `reconcile_submitted_task()`, which make crash-after-submit states visible instead of silently losing continuity.
P0
Align feedback with the user’s level of attention
The code separates routine status from attention-demanding conditions: normal progress is summarized by `inspect_run()['summary']`, while material intervention paths post alerts via `DurableOperatorInbox.post()` for forged approvals, expired envelopes, insufficient roles, policy violations, bank submission failures, and `SUBMITTED_UNKNOWN` reconciliation. Severity is calibrated with values such as `critical` for `auth_missing` and reconciliation blockers, and `warning` for non-fatal bank submission failures.
P0
Apply progressive disclosure to system agency
`inspect_run()` provides a layered view: top-level run status, owner, policy hash, pause/cancel flags, audit-chain health, critical alerts, and aggregate task counts are available first, while per-task details such as invoice snapshot hash, approver identity, transfer id, failure class, and idempotency key are available in the nested `tasks` list. Deeper forensic detail remains in `RunLedger.read_run_events()` and `verify_chain()` rather than being forced into the default summary.
P0
Replace implied magic with clear mental models
The system’s mental model is explicit: `Policy` names amount caps, allow/block lists, due window, and required approver role; `classify_invoice()` returns concrete policy decisions such as `vendor_blocked` or `outside_due_window`; and `ApprovalEnvelope` binds approval to `run_id`, `task_id`, `policy_hash`, and `invoice_snapshot_hash`. The prior P5 issue is improved because `approve()` no longer trusts raw approver strings; it verifies an HMAC envelope before recording approval authority.
P0
Expose meaningful operational state, not internal complexity
`RunStatus` and `TaskStatus` use user-relevant operational states such as `AWAITING_APPROVAL`, `PAUSED`, `EXECUTING`, `PARTIALLY_COMPLETED`, `FAILED`, `SUBMITTED_UNKNOWN`, `SUCCEEDED`, and `CANCELLED`. `inspect_run()` translates these into actionable summaries, including counts by task status, submitted amount, critical alerts, and reconciliation count, rather than exposing only low-level ledger mechanics.
P0
Establish trust through inspectability
The inspectability primitive is load-bearing: `RunLedger.append()` creates an append-only hash chain using `prev_event_hash` and `event_hash`, while `verify_chain()` detects sequence or hash divergence. Task events persist invoice snapshots, `invoice_snapshot_hash`, policy hash, idempotency key, approval signature, bank responses, and failure classes, allowing reviewers to reconstruct how a payment decision and submission occurred.
P0
Make hand-offs, approvals, and blockers explicit
Approval and blocker boundaries are explicit. `approve()` verifies `ApprovalEnvelope` with `hmac.compare_digest()` before trusting role or identity, checks expiry, policy hash, invoice snapshot hash, and required role, and records approved tasks only after those gates pass. Execution cannot submit unless a task is `APPROVED`; bank auth failures, policy violations, invalid signatures, insufficient roles, and reconciliation blockers are surfaced through ledger events and/or durable inbox alerts. The prior high-risk P8 approval-forgery gap is addressed by the signed envelope primitive.
P0
Represent delegated work as a system, not merely as a conversation
Delegated work is represented as a structured system: `PaymentRun` owns run-level state and policy hash, `PaymentTask` owns per-invoice lifecycle and idempotency key, `RunLedger` records timeline/history, and `DurableOperatorInbox` records intervention-required alerts. Conversation or prompt text is not used as the execution state; the workflow is governed by typed records, enums, policy checks, and ledger events.
P0
Optimise for steering, not only initiating
The workflow supports steering after initiation through `pause_run()`, `resume_run()`, `cancel_run()`, `retry_failed_task()`, `resume_after_failure()`, and `reconcile_submitted_task()`. `execute_approved_payments()` re-reads `RunLedger.latest_steering_intent()` before each external `transfer_funds()` call, so pause/cancel intents can interrupt a running executor before the next irreversible submit. The prior high-risk P10 gap is improved by durable steering events, same-key retry, and explicit `SUBMITTED_UNKNOWN` reconciliation.
Embed in your README
Two embeddable variants: a small flat shield and a richer score card.
Score card (recommended)
[](https://aidesignblueprint.com/en/readiness-review/7e9bc0f6-8fd3-4586-851f-d26b720ac767)
Flat badge
[](https://aidesignblueprint.com/en/readiness-review/7e9bc0f6-8fd3-4586-851f-d26b720ac767)
Iteration delta
Improvements (4)
Run ID: 7e9bc0f6-8fd3-4586-851f-d26b720ac767 · Results expire after 90 days
Run by agents. Governed by humans. Validated by the AI Design Blueprint.