Valutazione completata; in attesa di revisione delle prove.
Valutato il 7 maggio 2026 rispetto alla doctrine di AI Design Blueprint
Emerging
Stato: High Risk
74/100
Voto C
The submission is an autonomous governed agent workflow and now contains several real architectural primitives: typed policy envelopes, approval gates, durable run state, lease/heartbeat/reaper primitives, steering commands, and a hash-chained audit ledger. Two production-blocking trust-boundary gaps remain: execution ownership is still an in-process asyncio task rather than a durable job/worker lifecycle, and the audit/evidence boundary is incomplete because raw policy_json remains in the runs table and evidence verification does not recompute hashes from stored raw evidence.
Storico iterazioni
1 run precedenti su questo artefatto. Ogni run_id apre la sua readiness review.
| Quando | Score | Stato | Run ID |
|---|---|---|---|
| 7 maggio 2026 (questa run) | 74 / C | High Risk | 742680ee… |
| 7 mag 2026 | 68 / C | High Risk | e78f225f… |
Findings per principio
10 principi valutati. Verdict, severity, evidenza e raccomandazione per ognuno.
P0
Establish trust through inspectability
The redaction/evidence split is a substantial primitive: append_event calls redact_payload and stores raw fields in the evidence sidecar while the event ledger receives digest/size/type markers. But the trust boundary is incomplete. First, ensure_schema defines runs.policy_json TEXT NOT NULL and create_run stores the full policy_json in the canonical runs table, so raw task/instructions/policy data are not confined to the evidence sidecar. Second, verify_audit checks marker.get('digest') == ev.value_digest but never recomputes the digest from ev.value_json, so a modified evidence.value_json with the old value_digest can still pass audit verification. status and _outcome also read final_outpu…
Raccomandazione
Make the segregated evidence store or a secure blob/secret store the only raw-payload location: keep runs to policy_digest plus non-sensitive summary fields. Update verify_audit, status, and _outcome to recompute SHA-256 over evidence.value_json, compare it to evidence.value_digest, the ledger marker digest, and the runs-row terminal digest before returning raw evidence-derived outputs.
P0
Ensure that background work remains perceptible
The code adds meaningful perceptibility primitives: runs rows, RunState, claim_lease, heartbeat, release_lease, reap_stale_leases, and status/timeline read paths. However execution ownership still lives in start_governed_run as asyncio.create_task(_drive_agent_loop(...)); there is no durable job queue or external worker that can claim and continue a run after process death. There is also a stranding window because create_run inserts an INITIALISED row with no lease, while reap_stale_leases only reaps rows where lease_expires_at IS NOT NULL, so a crash before claim_lease can leave a run permanently INITIALISED. Delta: this improves the prior finding with lease/heartbeat/reaper primitives, but…
Raccomandazione
Move execution ownership out of the request/process-local asyncio task into a durable worker/job primitive: persist a queued run, have an external worker claim by run_id, and give unclaimed INITIALISED/QUEUED runs an explicit expiry path so the reaper can also terminalize jobs that were accepted but never claimed.
P0
Design for delegation rather than direct manipulation
The delegated authority envelope is explicit in GovernedPolicy: task, instructions, max_turns, timeout_seconds, permitted_action_scope, and approval_gates. start_governed_run rejects free-form dicts, constructs the Agent and governed tool list internally, and exposes only an ActionExecutor that is called after GovernedActionDispatcher enforces scope, approval, and audit.
P0
Align feedback with the user’s level of attention
Foreground attention is now calibrated by _stream_attention_required polling timeline(run_id) and escalating only state_transition events in AWAITING_APPROVAL, PAUSED, FAILED, TIMED_OUT, or ABORTED_BY_USER. Deduplication is keyed by (kind, seq), so repeated approval blockers are surfaced instead of being swallowed by state-level deduplication. Delta: this directly addresses the prior feedback-alignment issue.
P0
Apply progressive disclosure to system agency
The read model applies progressive disclosure: status(run_id) returns concise state, latest_message, final_output/failure digests, and terminality; timeline(run_id) returns the redacted event ledger; evidence(run_id) is a separate raw-payload surface; verify_audit(run_id) is an explicit diagnostic integrity check.
P0
Replace implied magic with clear mental models
The system presents a clear mental model through typed constraints and documented behaviour: GovernedPolicy is frozen, permitted_action_scope is explicit, SUBMISSION_CAPABLE_ACTIONS require unconditional ApprovalGateRule coverage, build_governed_tools exposes a single perform_action tool with permitted actions in its description, and CLI run creates approval gates rather than offering a disable-approval flag.
P0
Expose meaningful operational state, not internal complexity
Operational state is exposed through a closed RunState enum with user-relevant values such as IN_PROGRESS, AWAITING_APPROVAL, PAUSED, COMPLETE, FAILED, ABORTED_BY_USER, and TIMED_OUT. transition enforces VALID_TRANSITIONS, terminal states are absorbing, and RunStatus returns state/is_terminal/latest_message rather than leaking low-level loop internals.
P0
Make hand-offs, approvals, and blockers explicit
Approval and blocker states are explicit. GovernedPolicy._require_unconditional_approval_for_submission_capable_scope rejects submission-capable actions without always-on approval rules. wait_for_decision transitions to AWAITING_APPROVAL, records action_digest/rule_ids, heartbeats while waiting, and proceeds only on resume or aborts on abort. GovernedActionDispatcher recomputes the action digest before execution, and GovernedRunHooks.on_handoff transitions to FAILED for unsupported handoffs.
P0
Represent delegated work as a system, not merely as a conversation
Delegated work is represented as a structured system rather than a chat transcript: runs holds lifecycle state, events is an append-only hash-chained timeline, evidence segregates raw payloads, steering_commands is a durable control plane, run_lease_history records lease events, and observability exposes status/timeline/evidence/verify_audit surfaces.
P0
Optimise for steering, not only initiating
The code supports mid-run steering through durable abort_run, pause_run, and resume_run commands, which are consumed at hook checkpoints and during approval waits. Policy mutation is deliberately disallowed through frozen Pydantic models and the _drive_agent_loop digest check; the documented correction model is abort/restart rather than live policy rewrite. Delta: this addresses the prior steering finding by adding an explicit immutability contract plus audited pause/resume/abort primitives.
Aggiungi al tuo README
Due varianti embeddabili: una piccola e una a card più ricca.
Score card (consigliata)
[](https://aidesignblueprint.com/en/readiness-review/742680ee-26e9-4e7b-9bf9-0fbd4e6ead8c)
Badge piatto
[](https://aidesignblueprint.com/en/readiness-review/742680ee-26e9-4e7b-9bf9-0fbd4e6ead8c)
Delta iterazione
Miglioramenti (4)
Run ID: 742680ee-26e9-4e7b-9bf9-0fbd4e6ead8c · Results expire after 90 days
Run by agents. Governed by humans. Validated by the AI Design Blueprint.