Vai al contenuto principaleVai al footer
Governed

Allineamento confermato con la doctrine.

Agent Architecture Review, Snapshot di validazione

Valutato il 7 maggio 2026 rispetto alla doctrine di AI Design Blueprint

Production-ready

Stato: Aligned

100/100

Voto A

10 allineati
Verdetti per principio

The submission is an autonomous governed-agent workflow with persistent run state, approval gates, steering commands, leases, and an auditable event/evidence ledger. The Iter4 changes add load-bearing primitives rather than wrappers: LeaseLost now aborts pause waits, a pre-executor lease/state fence prevents actions under stolen/reaped leases, and audit verification is bidirectional across redaction markers and evidence rows. All applicable principles are aligned.

Storico iterazioni

4 run precedenti su questo artefatto. Ogni run_id apre la sua readiness review.

QuandoScoreStatoRun ID
7 maggio 2026 (questa run)100 / AAlignedffcc637e
7 mag 202674 / CHigh Risk37cc23be
7 mag 202674 / CHigh Risk2067531c
7 mag 202674 / CHigh Risk742680ee
7 mag 202668 / CHigh Riske78f225f
Certified production-readyTentativo 1/3

Confirmed: the visible Iter4 code addresses the prior high-risk seams with LeaseLost propagation during pause waits, a pre-executor lease fence, and bidirectional audit/evidence verification, and I found no specific code-cited missed production blocker.

Findings per principio

10 principi valutati. Verdict, severity, evidenza e raccomandazione per ognuno.

P0

Design for delegation rather than direct manipulation

aligned

Delegation is explicit: `GovernedActionDispatcher.dispatch` executes bounded `GovernedAction` objects through the single `perform_action` tool seam, while `GovernedPolicy.is_action_permitted` and `permitted_action_scope` constrain authority. Users/operators govern the work through durable pause/resume/abort commands drained in `GovernedRunHooks._drain_steering`, rather than manually executing each step.

P0

Ensure that background work remains perceptible

aligned

Background work remains perceptible through persistent state and heartbeat primitives: `_checkpoint` calls `heartbeat(...)` before work and emits `heartbeat` events, `_wait_for_resume` continues heartbeating while paused, and `reap_stale_runs` delegates to `reap_stale_leases` for watchdog recovery. `status()` and `timeline()` expose current state and event history after the user returns.

P0

Align feedback with the user’s level of attention

aligned

Feedback is calibrated by event type and attention need: routine progress is throttled via `heartbeat_every_n_turns`, while high-attention conditions produce explicit events such as `scope_violation`, `approval_binding_mismatch`, `tool_call_failed`, pause/resume transitions, and abort messages. The submitted unchanged CLI summary notes event de-duplication by `(kind, seq)` for attention-required announcements, preserving signal without noisy repeats.

P0

Apply progressive disclosure to system agency

aligned

The observability layer separates default status from deeper inspection: `status()` returns a compact `RunStatus` with `state`, `is_terminal`, `latest_message`, and final/failure digests, while `timeline()` exposes detailed event rows and `evidence()` exposes sidecar evidence when deeper audit is required. This is progressive disclosure rather than forcing raw ledger detail into the primary status path.

P0

Replace implied magic with clear mental models

aligned

The code replaces magic with explicit operating rules: the `perform_action` tool description lists permitted action names and states that submission-capable actions require operator approval, `ScopeViolation` makes out-of-scope execution fail closed, `ApprovalBindingMismatch` explains approval/action drift, and `on_handoff` transitions to `FAILED` with `reason_kind: unsupported_handoff`.

P0

Expose meaningful operational state, not internal complexity

aligned

Operational state is represented with user-relevant run states rather than only internal mechanics: transitions use `RunState.IN_PROGRESS`, `PAUSED`, `ABORTED_BY_USER`, and `FAILED`, while `status()` returns `state`, `is_terminal`, and `latest_message`. Lower-level details such as `entry_hash`, evidence digests, and raw sidecar rows are reserved for `timeline()`, `evidence()`, and `verify_audit()`.

P0

Establish trust through inspectability

aligned

Inspectability is backed by concrete audit primitives. `verify_audit` replays the hash chain using `prev_hash` and `_recompute_entry_hash`, validates genesis `policy_digest` against `runs.policy_digest`, recomputes `sha256(ev.value_json)` for each evidence row, requires marker digest agreement, derives `required_markers` by scanning every redacted marker in every `event.data`, and rejects unless `seen_markers == set(required_markers.keys())`. Delta: this improves the prior P7 finding by adding the bidirectional marker/evidence reconciliation that was previously missing.

P0

Make hand-offs, approvals, and blockers explicit

aligned

Approvals, handoffs, and blockers are explicit. `evaluate_approval` and `wait_for_decision` gate governed actions, `compute_action_digest` binds the approved action to the executable action, and `ApprovalBindingMismatch` stops drift. Unsupported handoffs are not silently followed: `on_handoff` records a `FAILED` transition with from/to agent names and raises `UnsupportedHandoff`. Scope and lease failures emit typed events such as `scope_violation` and `lease_fence_failed`.

P0

Represent delegated work as a system, not merely as a conversation

aligned

Delegated work is modeled as a system: the submission uses persistent `run_id` state, a run ledger, `state_transition` events, evidence sidecars, leases, watchdog reaping, steering commands, and a single governed tool seam. `timeline()` returns structured event records with `seq`, `kind`, `state`, `message`, `data`, and `entry_hash`, so execution is inspectable as stateful orchestration rather than merely as a chat transcript.

P0

Optimise for steering, not only initiating

aligned

Steering is supported by durable safe-boundary controls. `GovernedRunHooks._drain_steering` handles `abort` and `pause`, `_wait_for_resume` accepts `resume` or `abort` while continuing heartbeats, and transitions use `expected_worker_id` to prevent phantom writes. The Iter4 changes close the previous steering race: `LeaseLost` in `_wait_for_resume` now raises `AbortRequested` instead of returning silently, and `GovernedActionDispatcher.dispatch` performs a `heartbeat(...)` lease/state fence immediately before `await self._executor(action)`, recording `lease_fence_failed` and raising `ScopeViolation` if the lease is invalid. Delta: this improves the prior P10 finding by addressing both the sw…

Aggiungi al tuo README

Due varianti embeddabili: una piccola e una a card più ricca.

Score card (consigliata)

Blueprint Readiness Score card
[![Blueprint Readiness Score card](https://aidesignblueprint.com/api/badge/run/ffcc637e-1280-4f34-8fed-57c7359d7466/card.svg)](https://aidesignblueprint.com/en/readiness-review/ffcc637e-1280-4f34-8fed-57c7359d7466)

Badge piatto

Blueprint Readiness Score badge
[![Blueprint Readiness Score](https://aidesignblueprint.com/api/badge/run/ffcc637e-1280-4f34-8fed-57c7359d7466.svg)](https://aidesignblueprint.com/en/readiness-review/ffcc637e-1280-4f34-8fed-57c7359d7466)
Dettagli baseline e iterazione
Baseline: usedDoctrine: same doctrineRace: checked clear

Delta iterazione

Miglioramenti (2)

P7Establish trust through inspectabilityneeds_changesaligned
P10Optimise for steering, not only initiatingneeds_changesaligned
Rubric: 2026-05-04

Run ID: ffcc637e-1280-4f34-8fed-57c7359d7466 · Results expire after 90 days

Run by agents. Governed by humans. Validated by the AI Design Blueprint.