Assessment complete; awaiting evidence revision.
Evaluated 7 May 2026 against the AI Design Blueprint doctrine
Emerging
Status: High Risk
68/100
Grade C
Blueprint Readiness measures doctrine alignment, not runtime correctness. A production-ready verdict means the architecture embodies the 10 principles; it does not run your tests or types. Layer it on top of your test suite, not in place of it.
This is clearly an autonomous agentic workflow and it includes several strong governance primitives: a typed `GovernedPolicy`, a single `Runner.run` call site, enforced `RunState` transitions, durable steering commands, unconditional approval gates, action digests, and a hash-chained audit ledger. The main production blockers are lifecycle durability and sensitive-data handling: execution is still an in-process `asyncio.Task` that can strand runs after process death, and the audit trail records raw policy/action/result payloads that may contain form-fill secrets or PII.
Per-principle findings
10 principles evaluated. Verdict, severity, evidence and recommendation for each.
P7
needs changesproduction blocker75/100Establish trust through inspectability
Inspectability is strong structurally: `append_event` creates a hash-chained ledger, `run_genesis` commits the policy digest, tool calls and approval events are recorded, and `verify_audit` recomputes the chain. The blocker is that inspectability is not bounded for sensitive form-fill data: `start_governed_run` records full `policy_json`; `GovernedActionDispatcher.dispatch` logs raw `action_payload`, `target_text`, tool `result`, and error reprs; `transition` persists full `final_output` / `failure_reason`; and `timeline(...)` returns those fields directly. For a browser/form-fill agent, those payloads can contain credentials, PII, or regulated form contents.
Recommendation
Move sensitive payloads behind a redaction-aware audit boundary: keep the hash-chained ledger focused on typed summaries, rule IDs, action digests, and safe metadata, and store raw payloads only in an encrypted/redacted evidence store with explicit sensitivity classification and access control.
P2
needs changesproduction blocker70/100Ensure that background work remains perceptible
The code provides perceptibility during a healthy process via `status`, `timeline`, heartbeat events in `GovernedRunHooks._checkpoint`, and an immediate `runs` row from `create_run`. However, the actual execution lifecycle is not durable: `start_governed_run` uses `asyncio.create_task(...)` and stores the live task only in `GovernedRunHandle.outcome`; `timeout_seconds` is enforced only by in-memory `asyncio.wait_for` inside `_drive_agent_loop`. If the process/container dies, a run can remain `IN_PROGRESS` or `AWAITING_APPROVAL` forever with no worker lease, stale-heartbeat detector, recovery worker, or persisted terminal transition.
Recommendation
Move execution ownership to a durable worker/job primitive: persist a queued/running lease with worker identity and heartbeat timestamp, and run a separate watchdog/reaper that transitions expired leases to `TIMED_OUT` or a recoverable state. Keep `status` backed by that durable lifecycle rather than by an in-process task handle.
P?
needs changesproduction blocker50/100Align feedback with the user’s level of attention
The system mostly separates routine and attention-required feedback: heartbeats are written to the ledger, while `_stream_attention_required` surfaces `AWAITING_APPROVAL`, `PAUSED`, `FAILED`, and `TIMED_OUT`. But the CLI stream deduplicates by `RunState` using `seen.add(info.state)`, so a second or later `AWAITING_APPROVAL` transition in the same run will not be announced even though the agent is blocked for a new user decision.
Recommendation
Track and surface attention-required events by event sequence or transition identity, not only by state enum. Each new approval, pause, failure, or timeout event should produce a fresh operator-visible signal.
P10
needs changeshardening recommended35/100Optimise for steering, not only initiating
The code has a real steering primitive: `abort_run`, `pause_run`, and `resume_run` enqueue durable commands; `claim_next_steering_command` gives `abort` priority; and `GovernedRunHooks._drain_steering` applies those commands at checkpoints. The remaining gap is that steering is interrupt/resume-only: the canonical `GovernedPolicy` is committed at genesis and there is no audited primitive for adding a correction, revising constraints, or reprioritising work while preserving continuity.
Recommendation
Add one narrow, audited policy-revision or operator-instruction command if live redirection is in scope; otherwise explicitly declare runs immutable after genesis and present abort/restart as the supported correction model. Avoid adding a routing wrapper around the agent loop.
P1
alignedDesign for delegation rather than direct manipulation
Delegation is modeled through a bounded `GovernedPolicy` containing `task`, `instructions`, `max_turns`, `timeout_seconds`, `permitted_action_scope`, and `approval_gates`. Callers cannot inject an arbitrary `Agent` or tools: `_build_governed_agent` constructs the agent internally and `build_governed_tools` exposes only `perform_action`, while `start_governed_run` returns a `GovernedRunHandle` with a persistent `run_id`.
P4
alignedApply progressive disclosure to system agency
The read side applies progressive disclosure cleanly: `status(...)` returns a compact `RunStatus` with state, latest message, final output, and failure reason; `timeline(...)` exposes detailed event data only when requested; and `verify_audit(...)` is a separate diagnostic operation. This separates primary operational understanding from deeper inspection.
P5
alignedReplace implied magic with clear mental models
The code replaces implied magic with explicit operating rules. `GovernedPolicy` makes scope, timeout, turn limits, and approval rules concrete; `SUBMISSION_CAPABLE_ACTIONS` names actions requiring approval; `build_governed_tools` tells the model the permitted action names; and `_build_governed_agent` sets `handoffs=[]`, making unsupported delegation explicit rather than implicit.
P6
alignedExpose meaningful operational state, not internal complexity
Operational state is represented with a closed `RunState` enum (`INITIALISED`, `IN_PROGRESS`, `AWAITING_APPROVAL`, `PAUSED`, `COMPLETE`, `FAILED`, `ABORTED_BY_USER`, `TIMED_OUT`) and an explicit `VALID_TRANSITIONS` graph. The persistence-layer `transition(...)` enforces legal transitions under `BEGIN IMMEDIATE`, and `status(...)` exposes user-relevant state rather than raw SDK internals.
P8
alignedMake hand-offs, approvals, and blockers explicit
Approvals and blockers are explicit at the execution boundary. `GovernedPolicy._require_unconditional_approval_for_submission_capable_scope` rejects `click`, `submit`, or `keypress` scope without an unconditional `ApprovalGateRule`; `wait_for_decision` transitions to `RunState.AWAITING_APPROVAL` with action details and rule IDs; `resume` records approval, `abort` records decline/abort, and `ApprovalBindingMismatch` prevents execution if the approved action digest differs from the executable action. Unsupported handoffs are failed explicitly in `GovernedRunHooks.on_handoff`.
P9
alignedRepresent delegated work as a system, not merely as a conversation
Delegated work is represented as a structured system, not merely a conversation. Persistence separates `runs`, `events`, and `steering_commands`; `RunState` separates execution state from chat output; `timeline(...)` exposes ordered operational history; and tools are mediated through `GovernedActionDispatcher` rather than free-form conversational instructions.
Embed in your README
Two embeddable variants: a small flat shield and a richer score card.
Score card (recommended)
[](https://aidesignblueprint.com/en/readiness-review/e78f225f-32ce-4426-a90b-4143212302be)
Flat badge
[](https://aidesignblueprint.com/en/readiness-review/e78f225f-32ce-4426-a90b-4143212302be)
Run ID: e78f225f-32ce-4426-a90b-4143212302be · Results expire after 90 days
Run by agents. Governed by humans. Validated by the AI Design Blueprint.