Skip to main contentSkip to footer
Governed

Alignment confirmed with the doctrine.

Agent Architecture Review, Validation snapshot

Evaluated 7 May 2026 against the AI Design Blueprint doctrine

Production-ready

Status: Aligned

100/100

Grade A

10 aligned
Per-principle verdicts

The submission is an autonomous governed-agent workflow with persistent run state, approval gates, steering commands, leases, and an auditable event/evidence ledger. The Iter4 changes add load-bearing primitives rather than wrappers: LeaseLost now aborts pause waits, a pre-executor lease/state fence prevents actions under stolen/reaped leases, and audit verification is bidirectional across redaction markers and evidence rows. All applicable principles are aligned.

Iteration history

4 prior runs on this artifact. Each run_id opens its own readiness review.

WhenScoreStatusRun ID
7 May 2026 (this run)100 / AAlignedffcc637e
7 May 202674 / CHigh Risk37cc23be
7 May 202674 / CHigh Risk2067531c
7 May 202674 / CHigh Risk742680ee
7 May 202668 / CHigh Riske78f225f
Certified production-readyAttempt 1/3

Confirmed: the visible Iter4 code addresses the prior high-risk seams with LeaseLost propagation during pause waits, a pre-executor lease fence, and bidirectional audit/evidence verification, and I found no specific code-cited missed production blocker.

Per-principle findings

10 principles evaluated. Verdict, severity, evidence and recommendation for each.

P0

Design for delegation rather than direct manipulation

aligned

Delegation is explicit: `GovernedActionDispatcher.dispatch` executes bounded `GovernedAction` objects through the single `perform_action` tool seam, while `GovernedPolicy.is_action_permitted` and `permitted_action_scope` constrain authority. Users/operators govern the work through durable pause/resume/abort commands drained in `GovernedRunHooks._drain_steering`, rather than manually executing each step.

P0

Ensure that background work remains perceptible

aligned

Background work remains perceptible through persistent state and heartbeat primitives: `_checkpoint` calls `heartbeat(...)` before work and emits `heartbeat` events, `_wait_for_resume` continues heartbeating while paused, and `reap_stale_runs` delegates to `reap_stale_leases` for watchdog recovery. `status()` and `timeline()` expose current state and event history after the user returns.

P0

Align feedback with the user’s level of attention

aligned

Feedback is calibrated by event type and attention need: routine progress is throttled via `heartbeat_every_n_turns`, while high-attention conditions produce explicit events such as `scope_violation`, `approval_binding_mismatch`, `tool_call_failed`, pause/resume transitions, and abort messages. The submitted unchanged CLI summary notes event de-duplication by `(kind, seq)` for attention-required announcements, preserving signal without noisy repeats.

P0

Apply progressive disclosure to system agency

aligned

The observability layer separates default status from deeper inspection: `status()` returns a compact `RunStatus` with `state`, `is_terminal`, `latest_message`, and final/failure digests, while `timeline()` exposes detailed event rows and `evidence()` exposes sidecar evidence when deeper audit is required. This is progressive disclosure rather than forcing raw ledger detail into the primary status path.

P0

Replace implied magic with clear mental models

aligned

The code replaces magic with explicit operating rules: the `perform_action` tool description lists permitted action names and states that submission-capable actions require operator approval, `ScopeViolation` makes out-of-scope execution fail closed, `ApprovalBindingMismatch` explains approval/action drift, and `on_handoff` transitions to `FAILED` with `reason_kind: unsupported_handoff`.

P0

Expose meaningful operational state, not internal complexity

aligned

Operational state is represented with user-relevant run states rather than only internal mechanics: transitions use `RunState.IN_PROGRESS`, `PAUSED`, `ABORTED_BY_USER`, and `FAILED`, while `status()` returns `state`, `is_terminal`, and `latest_message`. Lower-level details such as `entry_hash`, evidence digests, and raw sidecar rows are reserved for `timeline()`, `evidence()`, and `verify_audit()`.

P0

Establish trust through inspectability

aligned

Inspectability is backed by concrete audit primitives. `verify_audit` replays the hash chain using `prev_hash` and `_recompute_entry_hash`, validates genesis `policy_digest` against `runs.policy_digest`, recomputes `sha256(ev.value_json)` for each evidence row, requires marker digest agreement, derives `required_markers` by scanning every redacted marker in every `event.data`, and rejects unless `seen_markers == set(required_markers.keys())`. Delta: this improves the prior P7 finding by adding the bidirectional marker/evidence reconciliation that was previously missing.

P0

Make hand-offs, approvals, and blockers explicit

aligned

Approvals, handoffs, and blockers are explicit. `evaluate_approval` and `wait_for_decision` gate governed actions, `compute_action_digest` binds the approved action to the executable action, and `ApprovalBindingMismatch` stops drift. Unsupported handoffs are not silently followed: `on_handoff` records a `FAILED` transition with from/to agent names and raises `UnsupportedHandoff`. Scope and lease failures emit typed events such as `scope_violation` and `lease_fence_failed`.

P0

Represent delegated work as a system, not merely as a conversation

aligned

Delegated work is modeled as a system: the submission uses persistent `run_id` state, a run ledger, `state_transition` events, evidence sidecars, leases, watchdog reaping, steering commands, and a single governed tool seam. `timeline()` returns structured event records with `seq`, `kind`, `state`, `message`, `data`, and `entry_hash`, so execution is inspectable as stateful orchestration rather than merely as a chat transcript.

P0

Optimise for steering, not only initiating

aligned

Steering is supported by durable safe-boundary controls. `GovernedRunHooks._drain_steering` handles `abort` and `pause`, `_wait_for_resume` accepts `resume` or `abort` while continuing heartbeats, and transitions use `expected_worker_id` to prevent phantom writes. The Iter4 changes close the previous steering race: `LeaseLost` in `_wait_for_resume` now raises `AbortRequested` instead of returning silently, and `GovernedActionDispatcher.dispatch` performs a `heartbeat(...)` lease/state fence immediately before `await self._executor(action)`, recording `lease_fence_failed` and raising `ScopeViolation` if the lease is invalid. Delta: this improves the prior P10 finding by addressing both the sw…

Embed in your README

Two embeddable variants: a small flat shield and a richer score card.

Score card (recommended)

Blueprint Readiness Score card
[![Blueprint Readiness Score card](https://aidesignblueprint.com/api/badge/run/ffcc637e-1280-4f34-8fed-57c7359d7466/card.svg)](https://aidesignblueprint.com/en/readiness-review/ffcc637e-1280-4f34-8fed-57c7359d7466)

Flat badge

Blueprint Readiness Score badge
[![Blueprint Readiness Score](https://aidesignblueprint.com/api/badge/run/ffcc637e-1280-4f34-8fed-57c7359d7466.svg)](https://aidesignblueprint.com/en/readiness-review/ffcc637e-1280-4f34-8fed-57c7359d7466)
Baseline and iteration details
Baseline: usedDoctrine: same doctrineRace: checked clear

Iteration delta

Improvements (2)

P7Establish trust through inspectabilityneeds_changesaligned
P10Optimise for steering, not only initiatingneeds_changesaligned
Rubric: 2026-05-04

Run ID: ffcc637e-1280-4f34-8fed-57c7359d7466 · Results expire after 90 days

Run by agents. Governed by humans. Validated by the AI Design Blueprint.