Assessment complete; awaiting evidence revision.
Evaluated 7 May 2026 against the AI Design Blueprint doctrine
Emerging
Status: High Risk
74/100
Grade C
This is an autonomous governed-agent workflow with strong delegation, approval, steering, lease, heartbeat, and audit primitives. Iter3 materially improves the prior blockers by making unclaimed INITIALISED runs reapable and by recomputing evidence digests from raw sidecar values, but the audit verifier is still not fully bidirectional: it certifies existing evidence rows, yet does not require every non-policy redaction marker in the ledger to have a matching sidecar evidence row. That remaining inspectability gap should be fixed before production use where audit completeness matters.
Iteration history
2 prior runs on this artifact. Each run_id opens its own readiness review.
Per-principle findings
10 principles evaluated. Verdict, severity, evidence and recommendation for each.
P0
Establish trust through inspectability
Iter3 adds important inspectability primitives: `runs` now carries `policy_digest` instead of a `policy_json` slot, `verify_audit` recomputes `sha256(ev.value_json.encode("utf-8"))`, checks it against `ev.value_digest`, checks the ledger marker digest, and requires policy evidence at `(seq=1, field="policy_json")`. However, the verification is still one-way for non-policy evidence: it loops over existing `evidence_rows` with `for ev in evidence_rows` and only requires `policy_evidence_seen`; it does not derive every redacted marker from `events` and require a corresponding evidence row for each marker. A deleted or omitted non-policy sidecar row could therefore leave the hash chain, terminal…
Recommendation
Make audit verification bidirectional: derive required `(seq, field, digest)` evidence keys from all redaction markers in ledger `event.data`, require exactly one matching sidecar row for each required marker, recompute each raw value digest, and fail certification on missing evidence. Add an explicit migration or startup assertion that removes/quarantines legacy `runs.policy_json` columns so the digest-only row boundary holds across upgrades.
P0
Design for delegation rather than direct manipulation
The workflow is designed around delegated intent and constraints rather than step-by-step manual manipulation: `GovernedPolicy`, frozen `ApprovalGateRule`, `build_governed_tools`, `SUBMISSION_CAPABLE_ACTIONS`, and `start_governed_run` establish scoped authority before execution. Approval-sensitive actions are routed through the governed dispatcher rather than left as free-form prompt behavior. Delta: this maintains the prior aligned delegation verdict.
P0
Ensure that background work remains perceptible
Background work has durable perceptibility primitives: `create_run` persists an `INITIALISED` row with `lease_expires_at = now + DEFAULT_QUEUE_TIMEOUT_SECONDS`, workers take ownership through `claim_lease`, liveness is refreshed through `heartbeat`, and `reap_stale_leases` is documented to transition both expired worker leases and never-claimed queue entries to `TIMED_OUT`. `status`, `timeline`, and the CLI `reap` verb provide return-and-inspect continuity. Delta: this improves the prior P2 blocker by giving unclaimed INITIALISED runs a terminal timeout path.
P0
Align feedback with the user’s level of attention
Feedback is calibrated by attention level: routine progress is represented through persisted state, heartbeat, and timeline events, while intervention points surface as explicit `AWAITING_APPROVAL` events. The CLI stream deduplicates by `(kind, seq)`, so repeated approval-required states are not collapsed incorrectly and user attention is requested when materially needed. Delta: this maintains the prior aligned feedback verdict.
P0
Apply progressive disclosure to system agency
The system separates default summary from deeper inspection: `RunStatus`/`status` expose operational status and digests, `timeline` exposes event history, `evidence` exposes raw sidecar material when authorized, and `verify_audit` provides a certification path. The Iter3 digest-only `runs` row also supports progressive disclosure by avoiding raw policy exposure in the canonical status surface. Delta: this maintains the prior aligned progressive-disclosure verdict.
P0
Replace implied magic with clear mental models
The code replaces implicit agent magic with explicit operating rules: `RunState` is a closed enum with `VALID_TRANSITIONS`, policy is frozen through `ConfigDict(frozen=True)` plus a runtime policy digest check, externally visible actions are governed by `ApprovalGateRule`, and violations are represented as `ApprovalBindingMismatch` or `scope_violation` events. The documented correction model is abort-and-restart for policy changes rather than silent mid-run mutation. Delta: this maintains the prior aligned mental-model verdict.
P0
Expose meaningful operational state, not internal complexity
Operational state is exposed in user-relevant terms rather than raw scheduler mechanics: states such as `INITIALISED`, `IN_PROGRESS`, `AWAITING_APPROVAL`, `COMPLETE`, `FAILED`, `TIMED_OUT`, and `ABORTED_BY_USER` describe what the user can act on. Technical details such as hash chains, `worker_id`, and leases are retained for diagnostics through timeline/evidence surfaces instead of being the only status model. Delta: this maintains the prior aligned operational-state verdict.
P0
Make hand-offs, approvals, and blockers explicit
Approvals, hand-offs, and blockers are explicit primitives rather than implicit chat turns: `wait_for_decision` blocks on approval with heartbeat and `worker_id`, `ApprovalGateRule` defines approval requirements, `ApprovalBindingMismatch` and `scope_violation` are recorded, and `on_handoff` rejects unsupported handoffs. `AWAITING_APPROVAL`, `FAILED`, `TIMED_OUT`, and `ABORTED_BY_USER` distinguish interruption, gating, and failure. Delta: this maintains the prior aligned blocker/approval verdict.
P0
Represent delegated work as a system, not merely as a conversation
Delegated work is represented as a structured system: persistent `runs`, append-only `events`, segregated `evidence`, `steering_commands`, `run_lease_history`, lease ownership, and state transitions separate execution state from conversational output. `load_timeline`, `status`, and `evidence` provide system views over the run rather than reducing the workflow to a message transcript. Delta: this maintains the prior aligned system-representation verdict.
P0
Optimise for steering, not only initiating
The workflow supports steering during execution through durable `pause`, `resume`, and `abort` verbs, `queue_steering_command`, `claim_next_steering_command`, and `GovernedRunHooks` that drain steering at checkpoints. The pause path continues heartbeating, aborts transition cleanly, and policy changes are handled through an explicit abort/restart model rather than unsafe mutable authority. Delta: this maintains the prior aligned steering verdict.
Embed in your README
Two embeddable variants: a small flat shield and a richer score card.
Score card (recommended)
[](https://aidesignblueprint.com/en/readiness-review/2067531c-e05d-4605-ac9a-55a596e59cc2)
Flat badge
[](https://aidesignblueprint.com/en/readiness-review/2067531c-e05d-4605-ac9a-55a596e59cc2)
Iteration delta
Improvements (2)
Run ID: 2067531c-e05d-4605-ac9a-55a596e59cc2 · Results expire after 90 days
Run by agents. Governed by humans. Validated by the AI Design Blueprint.