Skip to main contentSkip to footer
Reviewed

Assessment complete; awaiting evidence revision.

Agent Architecture Review, Validation snapshot

Evaluated 7 May 2026 against the AI Design Blueprint doctrine

Emerging

Status: High Risk

74/100

Grade C

9 aligned1 production blockers
Per-principle verdicts

This is an autonomous governed-agent workflow with strong delegation, approval, steering, lease, heartbeat, and audit primitives. Iter3 materially improves the prior blockers by making unclaimed INITIALISED runs reapable and by recomputing evidence digests from raw sidecar values, but the audit verifier is still not fully bidirectional: it certifies existing evidence rows, yet does not require every non-policy redaction marker in the ledger to have a matching sidecar evidence row. That remaining inspectability gap should be fixed before production use where audit completeness matters.

Iteration history

2 prior runs on this artifact. Each run_id opens its own readiness review.

WhenScoreStatusRun ID
7 May 2026 (this run)74 / CHigh Risk2067531c
7 May 202674 / CHigh Risk742680ee
7 May 202668 / CHigh Riske78f225f

Per-principle findings

10 principles evaluated. Verdict, severity, evidence and recommendation for each.

P0

Establish trust through inspectability

needs changesproduction blocker55/100

Iter3 adds important inspectability primitives: `runs` now carries `policy_digest` instead of a `policy_json` slot, `verify_audit` recomputes `sha256(ev.value_json.encode("utf-8"))`, checks it against `ev.value_digest`, checks the ledger marker digest, and requires policy evidence at `(seq=1, field="policy_json")`. However, the verification is still one-way for non-policy evidence: it loops over existing `evidence_rows` with `for ev in evidence_rows` and only requires `policy_evidence_seen`; it does not derive every redacted marker from `events` and require a corresponding evidence row for each marker. A deleted or omitted non-policy sidecar row could therefore leave the hash chain, terminal…

Recommendation

Make audit verification bidirectional: derive required `(seq, field, digest)` evidence keys from all redaction markers in ledger `event.data`, require exactly one matching sidecar row for each required marker, recompute each raw value digest, and fail certification on missing evidence. Add an explicit migration or startup assertion that removes/quarantines legacy `runs.policy_json` columns so the digest-only row boundary holds across upgrades.

P0

Design for delegation rather than direct manipulation

aligned

The workflow is designed around delegated intent and constraints rather than step-by-step manual manipulation: `GovernedPolicy`, frozen `ApprovalGateRule`, `build_governed_tools`, `SUBMISSION_CAPABLE_ACTIONS`, and `start_governed_run` establish scoped authority before execution. Approval-sensitive actions are routed through the governed dispatcher rather than left as free-form prompt behavior. Delta: this maintains the prior aligned delegation verdict.

P0

Ensure that background work remains perceptible

aligned

Background work has durable perceptibility primitives: `create_run` persists an `INITIALISED` row with `lease_expires_at = now + DEFAULT_QUEUE_TIMEOUT_SECONDS`, workers take ownership through `claim_lease`, liveness is refreshed through `heartbeat`, and `reap_stale_leases` is documented to transition both expired worker leases and never-claimed queue entries to `TIMED_OUT`. `status`, `timeline`, and the CLI `reap` verb provide return-and-inspect continuity. Delta: this improves the prior P2 blocker by giving unclaimed INITIALISED runs a terminal timeout path.

P0

Align feedback with the user’s level of attention

aligned

Feedback is calibrated by attention level: routine progress is represented through persisted state, heartbeat, and timeline events, while intervention points surface as explicit `AWAITING_APPROVAL` events. The CLI stream deduplicates by `(kind, seq)`, so repeated approval-required states are not collapsed incorrectly and user attention is requested when materially needed. Delta: this maintains the prior aligned feedback verdict.

P0

Apply progressive disclosure to system agency

aligned

The system separates default summary from deeper inspection: `RunStatus`/`status` expose operational status and digests, `timeline` exposes event history, `evidence` exposes raw sidecar material when authorized, and `verify_audit` provides a certification path. The Iter3 digest-only `runs` row also supports progressive disclosure by avoiding raw policy exposure in the canonical status surface. Delta: this maintains the prior aligned progressive-disclosure verdict.

P0

Replace implied magic with clear mental models

aligned

The code replaces implicit agent magic with explicit operating rules: `RunState` is a closed enum with `VALID_TRANSITIONS`, policy is frozen through `ConfigDict(frozen=True)` plus a runtime policy digest check, externally visible actions are governed by `ApprovalGateRule`, and violations are represented as `ApprovalBindingMismatch` or `scope_violation` events. The documented correction model is abort-and-restart for policy changes rather than silent mid-run mutation. Delta: this maintains the prior aligned mental-model verdict.

P0

Expose meaningful operational state, not internal complexity

aligned

Operational state is exposed in user-relevant terms rather than raw scheduler mechanics: states such as `INITIALISED`, `IN_PROGRESS`, `AWAITING_APPROVAL`, `COMPLETE`, `FAILED`, `TIMED_OUT`, and `ABORTED_BY_USER` describe what the user can act on. Technical details such as hash chains, `worker_id`, and leases are retained for diagnostics through timeline/evidence surfaces instead of being the only status model. Delta: this maintains the prior aligned operational-state verdict.

P0

Make hand-offs, approvals, and blockers explicit

aligned

Approvals, hand-offs, and blockers are explicit primitives rather than implicit chat turns: `wait_for_decision` blocks on approval with heartbeat and `worker_id`, `ApprovalGateRule` defines approval requirements, `ApprovalBindingMismatch` and `scope_violation` are recorded, and `on_handoff` rejects unsupported handoffs. `AWAITING_APPROVAL`, `FAILED`, `TIMED_OUT`, and `ABORTED_BY_USER` distinguish interruption, gating, and failure. Delta: this maintains the prior aligned blocker/approval verdict.

P0

Represent delegated work as a system, not merely as a conversation

aligned

Delegated work is represented as a structured system: persistent `runs`, append-only `events`, segregated `evidence`, `steering_commands`, `run_lease_history`, lease ownership, and state transitions separate execution state from conversational output. `load_timeline`, `status`, and `evidence` provide system views over the run rather than reducing the workflow to a message transcript. Delta: this maintains the prior aligned system-representation verdict.

P0

Optimise for steering, not only initiating

aligned

The workflow supports steering during execution through durable `pause`, `resume`, and `abort` verbs, `queue_steering_command`, `claim_next_steering_command`, and `GovernedRunHooks` that drain steering at checkpoints. The pause path continues heartbeating, aborts transition cleanly, and policy changes are handled through an explicit abort/restart model rather than unsafe mutable authority. Delta: this maintains the prior aligned steering verdict.

Embed in your README

Two embeddable variants: a small flat shield and a richer score card.

Score card (recommended)

Blueprint Readiness Score card
[![Blueprint Readiness Score card](https://aidesignblueprint.com/api/badge/run/2067531c-e05d-4605-ac9a-55a596e59cc2/card.svg)](https://aidesignblueprint.com/en/readiness-review/2067531c-e05d-4605-ac9a-55a596e59cc2)

Flat badge

Blueprint Readiness Score badge
[![Blueprint Readiness Score](https://aidesignblueprint.com/api/badge/run/2067531c-e05d-4605-ac9a-55a596e59cc2.svg)](https://aidesignblueprint.com/en/readiness-review/2067531c-e05d-4605-ac9a-55a596e59cc2)
Baseline and iteration details
Baseline: usedDoctrine: same doctrineRace: checked clear

Iteration delta

Improvements (2)

P2Ensure that background work remains perceptibleneeds_changesaligned
P7Establish trust through inspectabilityneeds_changesneeds_changes
Rubric: 2026-05-04Grade limited by 0 high-risk findings

Run ID: 2067531c-e05d-4605-ac9a-55a596e59cc2 · Results expire after 90 days

Run by agents. Governed by humans. Validated by the AI Design Blueprint.