Allineamento confermato con la doctrine.
Valutato il 7 maggio 2026 rispetto alla doctrine di AI Design Blueprint
Production-ready
Stato: Aligned
100/100
Voto A
The submission is an autonomous governed-agent workflow with persistent run state, approval gates, steering commands, leases, and an auditable event/evidence ledger. The Iter4 changes add load-bearing primitives rather than wrappers: LeaseLost now aborts pause waits, a pre-executor lease/state fence prevents actions under stolen/reaped leases, and audit verification is bidirectional across redaction markers and evidence rows. All applicable principles are aligned.
Storico iterazioni
4 run precedenti su questo artefatto. Ogni run_id apre la sua readiness review.
Confirmed: the visible Iter4 code addresses the prior high-risk seams with LeaseLost propagation during pause waits, a pre-executor lease fence, and bidirectional audit/evidence verification, and I found no specific code-cited missed production blocker.
Findings per principio
10 principi valutati. Verdict, severity, evidenza e raccomandazione per ognuno.
P0
Design for delegation rather than direct manipulation
Delegation is explicit: `GovernedActionDispatcher.dispatch` executes bounded `GovernedAction` objects through the single `perform_action` tool seam, while `GovernedPolicy.is_action_permitted` and `permitted_action_scope` constrain authority. Users/operators govern the work through durable pause/resume/abort commands drained in `GovernedRunHooks._drain_steering`, rather than manually executing each step.
P0
Ensure that background work remains perceptible
Background work remains perceptible through persistent state and heartbeat primitives: `_checkpoint` calls `heartbeat(...)` before work and emits `heartbeat` events, `_wait_for_resume` continues heartbeating while paused, and `reap_stale_runs` delegates to `reap_stale_leases` for watchdog recovery. `status()` and `timeline()` expose current state and event history after the user returns.
P0
Align feedback with the user’s level of attention
Feedback is calibrated by event type and attention need: routine progress is throttled via `heartbeat_every_n_turns`, while high-attention conditions produce explicit events such as `scope_violation`, `approval_binding_mismatch`, `tool_call_failed`, pause/resume transitions, and abort messages. The submitted unchanged CLI summary notes event de-duplication by `(kind, seq)` for attention-required announcements, preserving signal without noisy repeats.
P0
Apply progressive disclosure to system agency
The observability layer separates default status from deeper inspection: `status()` returns a compact `RunStatus` with `state`, `is_terminal`, `latest_message`, and final/failure digests, while `timeline()` exposes detailed event rows and `evidence()` exposes sidecar evidence when deeper audit is required. This is progressive disclosure rather than forcing raw ledger detail into the primary status path.
P0
Replace implied magic with clear mental models
The code replaces magic with explicit operating rules: the `perform_action` tool description lists permitted action names and states that submission-capable actions require operator approval, `ScopeViolation` makes out-of-scope execution fail closed, `ApprovalBindingMismatch` explains approval/action drift, and `on_handoff` transitions to `FAILED` with `reason_kind: unsupported_handoff`.
P0
Expose meaningful operational state, not internal complexity
Operational state is represented with user-relevant run states rather than only internal mechanics: transitions use `RunState.IN_PROGRESS`, `PAUSED`, `ABORTED_BY_USER`, and `FAILED`, while `status()` returns `state`, `is_terminal`, and `latest_message`. Lower-level details such as `entry_hash`, evidence digests, and raw sidecar rows are reserved for `timeline()`, `evidence()`, and `verify_audit()`.
P0
Establish trust through inspectability
Inspectability is backed by concrete audit primitives. `verify_audit` replays the hash chain using `prev_hash` and `_recompute_entry_hash`, validates genesis `policy_digest` against `runs.policy_digest`, recomputes `sha256(ev.value_json)` for each evidence row, requires marker digest agreement, derives `required_markers` by scanning every redacted marker in every `event.data`, and rejects unless `seen_markers == set(required_markers.keys())`. Delta: this improves the prior P7 finding by adding the bidirectional marker/evidence reconciliation that was previously missing.
P0
Make hand-offs, approvals, and blockers explicit
Approvals, handoffs, and blockers are explicit. `evaluate_approval` and `wait_for_decision` gate governed actions, `compute_action_digest` binds the approved action to the executable action, and `ApprovalBindingMismatch` stops drift. Unsupported handoffs are not silently followed: `on_handoff` records a `FAILED` transition with from/to agent names and raises `UnsupportedHandoff`. Scope and lease failures emit typed events such as `scope_violation` and `lease_fence_failed`.
P0
Represent delegated work as a system, not merely as a conversation
Delegated work is modeled as a system: the submission uses persistent `run_id` state, a run ledger, `state_transition` events, evidence sidecars, leases, watchdog reaping, steering commands, and a single governed tool seam. `timeline()` returns structured event records with `seq`, `kind`, `state`, `message`, `data`, and `entry_hash`, so execution is inspectable as stateful orchestration rather than merely as a chat transcript.
P0
Optimise for steering, not only initiating
Steering is supported by durable safe-boundary controls. `GovernedRunHooks._drain_steering` handles `abort` and `pause`, `_wait_for_resume` accepts `resume` or `abort` while continuing heartbeats, and transitions use `expected_worker_id` to prevent phantom writes. The Iter4 changes close the previous steering race: `LeaseLost` in `_wait_for_resume` now raises `AbortRequested` instead of returning silently, and `GovernedActionDispatcher.dispatch` performs a `heartbeat(...)` lease/state fence immediately before `await self._executor(action)`, recording `lease_fence_failed` and raising `ScopeViolation` if the lease is invalid. Delta: this improves the prior P10 finding by addressing both the sw…
Aggiungi al tuo README
Due varianti embeddabili: una piccola e una a card più ricca.
Score card (consigliata)
[](https://aidesignblueprint.com/en/readiness-review/ffcc637e-1280-4f34-8fed-57c7359d7466)
Badge piatto
[](https://aidesignblueprint.com/en/readiness-review/ffcc637e-1280-4f34-8fed-57c7359d7466)
Delta iterazione
Miglioramenti (2)
Run ID: ffcc637e-1280-4f34-8fed-57c7359d7466 · Results expire after 90 days
Run by agents. Governed by humans. Validated by the AI Design Blueprint.