Assessment complete; awaiting evidence revision.
Evaluated 7 May 2026 against the AI Design Blueprint doctrine
Emerging
Status: High Risk
68/100
Grade C
This is clearly an autonomous agentic workflow and it includes several strong governance primitives: a typed `GovernedPolicy`, a single `Runner.run` call site, enforced `RunState` transitions, durable steering commands, unconditional approval gates, action digests, and a hash-chained audit ledger. The main production blockers are lifecycle durability and sensitive-data handling: execution is still an in-process `asyncio.Task` that can strand runs after process death, and the audit trail records raw policy/action/result payloads that may contain form-fill secrets or PII.
Per-principle findings
10 principles evaluated. Verdict, severity, evidence and recommendation for each.
P0
Establish trust through inspectability
Inspectability is strong structurally: `append_event` creates a hash-chained ledger, `run_genesis` commits the policy digest, tool calls and approval events are recorded, and `verify_audit` recomputes the chain. The blocker is that inspectability is not bounded for sensitive form-fill data: `start_governed_run` records full `policy_json`; `GovernedActionDispatcher.dispatch` logs raw `action_payload`, `target_text`, tool `result`, and error reprs; `transition` persists full `final_output` / `failure_reason`; and `timeline(...)` returns those fields directly. For a browser/form-fill agent, those payloads can contain credentials, PII, or regulated form contents.
Recommendation
Move sensitive payloads behind a redaction-aware audit boundary: keep the hash-chained ledger focused on typed summaries, rule IDs, action digests, and safe metadata, and store raw payloads only in an encrypted/redacted evidence store with explicit sensitivity classification and access control.
P0
Ensure that background work remains perceptible
The code provides perceptibility during a healthy process via `status`, `timeline`, heartbeat events in `GovernedRunHooks._checkpoint`, and an immediate `runs` row from `create_run`. However, the actual execution lifecycle is not durable: `start_governed_run` uses `asyncio.create_task(...)` and stores the live task only in `GovernedRunHandle.outcome`; `timeout_seconds` is enforced only by in-memory `asyncio.wait_for` inside `_drive_agent_loop`. If the process/container dies, a run can remain `IN_PROGRESS` or `AWAITING_APPROVAL` forever with no worker lease, stale-heartbeat detector, recovery worker, or persisted terminal transition.
Recommendation
Move execution ownership to a durable worker/job primitive: persist a queued/running lease with worker identity and heartbeat timestamp, and run a separate watchdog/reaper that transitions expired leases to `TIMED_OUT` or a recoverable state. Keep `status` backed by that durable lifecycle rather than by an in-process task handle.
P0
Align feedback with the user’s level of attention
The system mostly separates routine and attention-required feedback: heartbeats are written to the ledger, while `_stream_attention_required` surfaces `AWAITING_APPROVAL`, `PAUSED`, `FAILED`, and `TIMED_OUT`. But the CLI stream deduplicates by `RunState` using `seen.add(info.state)`, so a second or later `AWAITING_APPROVAL` transition in the same run will not be announced even though the agent is blocked for a new user decision.
Recommendation
Track and surface attention-required events by event sequence or transition identity, not only by state enum. Each new approval, pause, failure, or timeout event should produce a fresh operator-visible signal.
P0
Optimise for steering, not only initiating
The code has a real steering primitive: `abort_run`, `pause_run`, and `resume_run` enqueue durable commands; `claim_next_steering_command` gives `abort` priority; and `GovernedRunHooks._drain_steering` applies those commands at checkpoints. The remaining gap is that steering is interrupt/resume-only: the canonical `GovernedPolicy` is committed at genesis and there is no audited primitive for adding a correction, revising constraints, or reprioritising work while preserving continuity.
Recommendation
Add one narrow, audited policy-revision or operator-instruction command if live redirection is in scope; otherwise explicitly declare runs immutable after genesis and present abort/restart as the supported correction model. Avoid adding a routing wrapper around the agent loop.
P0
Design for delegation rather than direct manipulation
Delegation is modeled through a bounded `GovernedPolicy` containing `task`, `instructions`, `max_turns`, `timeout_seconds`, `permitted_action_scope`, and `approval_gates`. Callers cannot inject an arbitrary `Agent` or tools: `_build_governed_agent` constructs the agent internally and `build_governed_tools` exposes only `perform_action`, while `start_governed_run` returns a `GovernedRunHandle` with a persistent `run_id`.
P0
Apply progressive disclosure to system agency
The read side applies progressive disclosure cleanly: `status(...)` returns a compact `RunStatus` with state, latest message, final output, and failure reason; `timeline(...)` exposes detailed event data only when requested; and `verify_audit(...)` is a separate diagnostic operation. This separates primary operational understanding from deeper inspection.
P0
Replace implied magic with clear mental models
The code replaces implied magic with explicit operating rules. `GovernedPolicy` makes scope, timeout, turn limits, and approval rules concrete; `SUBMISSION_CAPABLE_ACTIONS` names actions requiring approval; `build_governed_tools` tells the model the permitted action names; and `_build_governed_agent` sets `handoffs=[]`, making unsupported delegation explicit rather than implicit.
P0
Expose meaningful operational state, not internal complexity
Operational state is represented with a closed `RunState` enum (`INITIALISED`, `IN_PROGRESS`, `AWAITING_APPROVAL`, `PAUSED`, `COMPLETE`, `FAILED`, `ABORTED_BY_USER`, `TIMED_OUT`) and an explicit `VALID_TRANSITIONS` graph. The persistence-layer `transition(...)` enforces legal transitions under `BEGIN IMMEDIATE`, and `status(...)` exposes user-relevant state rather than raw SDK internals.
P0
Make hand-offs, approvals, and blockers explicit
Approvals and blockers are explicit at the execution boundary. `GovernedPolicy._require_unconditional_approval_for_submission_capable_scope` rejects `click`, `submit`, or `keypress` scope without an unconditional `ApprovalGateRule`; `wait_for_decision` transitions to `RunState.AWAITING_APPROVAL` with action details and rule IDs; `resume` records approval, `abort` records decline/abort, and `ApprovalBindingMismatch` prevents execution if the approved action digest differs from the executable action. Unsupported handoffs are failed explicitly in `GovernedRunHooks.on_handoff`.
P0
Represent delegated work as a system, not merely as a conversation
Delegated work is represented as a structured system, not merely a conversation. Persistence separates `runs`, `events`, and `steering_commands`; `RunState` separates execution state from chat output; `timeline(...)` exposes ordered operational history; and tools are mediated through `GovernedActionDispatcher` rather than free-form conversational instructions.
Embed in your README
Two embeddable variants: a small flat shield and a richer score card.
Score card (recommended)
[](https://aidesignblueprint.com/en/readiness-review/e78f225f-32ce-4426-a90b-4143212302be)
Flat badge
[](https://aidesignblueprint.com/en/readiness-review/e78f225f-32ce-4426-a90b-4143212302be)
Run ID: e78f225f-32ce-4426-a90b-4143212302be · Results expire after 90 days
Run by agents. Governed by humans. Validated by the AI Design Blueprint.