Vai al contenuto principaleVai al footer
Revisionato

Valutazione completata; in attesa di revisione delle prove.

Agent Architecture Review, Snapshot di validazione

Valutato il 12 maggio 2026 rispetto alla doctrine di AI Design Blueprint

Emergente

Stato: Alto rischio

74/100

Voto C

9 allineati1 blocker produzione
Verdetti per principio

This is an autonomous cohort-validation workflow: it persists jobs, fetches repositories, selects code, calls an external validator/LLM service, sends external onboarding emails, and now exposes operator steering commands. Most delegation, perceptibility, audit, and blocker primitives are structurally sound. Iter12 materially improves steering, but P10 still needs changes because onboarding retry is not truly step-aware/idempotent and validation retry can reset state while a job may still be active, creating duplicate-run/race risk.

Storico iterazioni

5 run precedenti su questo artefatto. Ogni run_id apre la sua readiness review.

QuandoScoreStatoRun ID
12 maggio 2026 (questa run)74 / CAlto rischio4760459e
12 mag 202674 / CAlto rischio9caf9385
12 mag 202674 / CAllineato093809b5
12 mag 202698 / AAllineato4128f700
12 mag 202698 / AAllineato270e7ca6
12 mag 202674 / CAlto rischio8364019d

Findings per principio

10 principi valutati. Verdict, severity, evidenza e raccomandazione per ognuno.

P0

Richiede modificheBlocker di produzione60/100

Optimise for steering, not only initiating

Iter12 adds a real durable steering primitive in `steer_cohort_validation.py`: `request_abort()` persists `abort_requested=True`, `retry_failed_validation_job()` resets application validation state, and `retry_onboarding_handoff()` clears typed onboarding failures. However, onboarding recovery is not actually step-aware/idempotent: `retry_onboarding_handoff()` sets `onboarding_state='pending'`, and `approve()` then unconditionally calls `create_firebase_user_for_email(app.email, ...)` before checking `existing_user` or branching on the prior failed step. A retry from `sign_in_link_failed` or `approval_email_failed` can therefore re-run Firebase user creation after the user was already create…

Raccomandazione

Move retry authority into a small transactional state-transition boundary that dispatches only the next missing side effect. For onboarding, persist or derive the last successful external step and use an idempotent get-or-create Firebase/user operation before sign-in-link/email retries; do not reset every recoverable failure back to `pending`. For validation retry, require the latest job to be terminal and `retry_eligible` before clearing `validation_run_id`, and record the old/new job relationship atomically.

P0

Allineato

Design for delegation rather than direct manipulation

Delegated work is represented as an assigned cohort-validation task rather than manual step execution: `approve()` transitions the application into `validation_queued`, `run_cohort_validate.run()` creates a persistent `CohortValidationJob`, and `_execute_with_job()` carries out clone/select/bundle/validate steps from `repo_url`. The scope is explicit through `ValidationContext(repository=namespace, files=list(selection.selected_paths))`, and operator controls exist via `request_abort()`, `retry_failed_validation_job()`, and `retry_onboarding_handoff()`.

P0

Allineato

Ensure that background work remains perceptible

Background work remains perceptible through persisted state on `CohortValidationJob`: `status`, `queued_at`, per-step timestamps such as `cloning_started_at` and `validating_started_at`, `terminal_at`, `failure_kind`, `safe_display_message`, `retry_eligible`, and `abort_requested`. `_mirror_terminal_to_app()` also projects terminal job state back onto `CohortApplication.onboarding_state`, allowing the application record to preserve continuity after the runner exits.

P0

Allineato

Align feedback with the user’s level of attention

The workflow separates routine status from higher-attention failure detail. Normal progress is encoded in concise states such as `queued`, `cloning`, `selecting`, `bundling`, and `validating`; failures carry `failure_kind`, `safe_display_message`, and `retry_eligible`; and `_summarize_validate_log()` deliberately reduces validator internals to a summary of presence/type/key counts instead of flooding the operator with raw logs.

P0

Allineato

Apply progressive disclosure to system agency

The code uses progressive disclosure between primary operational state and deeper audit detail. The primary records expose compact states on `CohortApplication.onboarding_state` and `CohortValidationJob.status`, while `_execute_with_job()` builds a detailed `audit_object` containing source commit, selected file hashes, skipped files, bundle hash, validator latency, usage presence, envelope schema, and envelope hash for deeper inspection when needed.

P0

Allineato

Replace implied magic with clear mental models

The workflow gives operators a concrete mental model through typed lifecycle states and explicit constraints: `STEPS`, `TERMINAL_STATUSES`, and `FAILURE_KINDS` define the runner’s state space; `BOUNDARY_HEADER` and `ENVELOPE_ADVISORY` state that repository code is untrusted inert input; and the steering CLI documents the known limitation that an in-flight `validate_code_against_principles()` call is not cancellable until it returns.

P0

Allineato

Expose meaningful operational state, not internal complexity

Meaningful operational states are exposed without requiring users to interpret raw stack traces. `mark_blocked()`, `mark_failed()`, and `mark_aborted()` store user-relevant terminal statuses plus bounded `safe_display_message` values, while `_mirror_terminal_to_app()` converts job outcomes into application-level states such as `validation_complete` and `validation_failed`. Deeper mechanics remain available in audit JSON rather than being the primary state model.

P0

Allineato

Establish trust through inspectability

The implementation has strong inspectability primitives: `build_file_envelope()` creates a typed `bridge.files.v2` envelope with an `envelope_hash`; `_build_implementation_context()` records per-file SHA-256 hashes; `fetch_public_repo()` captures `commit_sha`; and `_execute_with_job()` persists an `audit_object` with source, selection, bundle hash, validator log summary, and boundary contract metadata inside `UserValidationRun.result_json`. This makes the validator result traceable to a specific repository snapshot and selected file set.

P0

Allineato

Make hand-offs, approvals, and blockers explicit

Approvals, handoffs, and blockers are explicit. `approve()` requires an interactive confirmation unless `--yes` is supplied, external onboarding steps have typed failure states (`firebase_user_failed`, `sign_in_link_failed`, `approval_email_failed`), repository/selector failures are marked through `mark_blocked()` with concrete `failure_kind` values, and validation/tool failures are marked through `mark_failed()` with retry eligibility. The code distinguishes blocked, failed, aborted, and completed terminal states instead of stalling silently.

P0

Allineato

Represent delegated work as a system, not merely as a conversation

Delegated work is represented as a structured system rather than a message stream. `CohortValidationJob` is the durable task object, `CohortApplication` carries onboarding state, `UserValidationRun` stores the validator result, and `_execute_with_job()` advances through explicit clone/select/bundle/validate phases. Dependencies and artifacts are captured as structured fields such as `commit_sha`, `bundle_sha256`, `selected_count`, `skipped_count`, and `validation_run_local_id`.

Aggiungi al tuo README

Due varianti embeddabili: una piccola e una a card più ricca.

Score card (consigliata)

Blueprint Readiness Score card
[![Blueprint Readiness Score card](https://aidesignblueprint.com/api/badge/run/4760459e-5af0-4927-834c-9fac4c5c3bd2/card.svg)](https://aidesignblueprint.com/en/readiness-review/4760459e-5af0-4927-834c-9fac4c5c3bd2)

Badge piatto

Blueprint Readiness Score badge
[![Blueprint Readiness Score](https://aidesignblueprint.com/api/badge/run/4760459e-5af0-4927-834c-9fac4c5c3bd2.svg)](https://aidesignblueprint.com/en/readiness-review/4760459e-5af0-4927-834c-9fac4c5c3bd2)
Dettagli baseline e iterazione
Baseline: usedDoctrine: same doctrineRace: checked clear
Rubric: 2026-05-04Voto limitato da 0 finding ad alto rischio

Run ID: 4760459e-5af0-4927-834c-9fac4c5c3bd2 · Results expire after 90 days

Run by agents. Governed by humans. Validated by the AI Design Blueprint.