Vai al contenuto principaleVai al footer
Revisionato

Valutazione completata; in attesa di revisione delle prove.

Agent Architecture Review, Snapshot di validazione

Valutato il 12 maggio 2026 rispetto alla doctrine di AI Design Blueprint

Emergente

Stato: Alto rischio

74/100

Voto C

8 allineati1 blocker produzione1 hardening
Verdetti per principio

The submission is an autonomous validation workflow with durable job rows, typed lifecycle states, explicit source/audit metadata, and a strong untrusted-code envelope. The prior silent `validation_queued`/no-job path is substantially improved by the new `OPENAI_API_KEY` guard and no-job/non-zero fallback, but one production blocker remains: the advertised recovery/manual validation path can complete a job without moving `CohortApplication.onboarding_state` out of `validation_failed`. Steering also remains limited, but that is a hardening gap rather than a production blocker.

Storico iterazioni

5 run precedenti su questo artefatto. Ogni run_id apre la sua readiness review.

QuandoScoreStatoRun ID
12 maggio 2026 (questa run)74 / CAlto rischio8364019d
12 mag 202674 / CAlto rischio14a3456f
12 mag 202674 / CAlto rischio3f3bb587
12 mag 202674 / CAlto rischio659a695a
12 mag 202667 / CAlto rischioe476247c
12 mag 202630 / FAlto rischio86b2c59d

Findings per principio

10 principi valutati. Verdict, severity, evidenza e raccomandazione per ognuno.

P0

Richiede modificheBlocker di produzione45/100

Make hand-offs, approvals, and blockers explicit

The explicit approval and blocker model is mostly strong: `approve()` prompts unless `--yes`, job blockers are persisted with `mark_blocked()`/`mark_failed()`, and the new `OPENAI_API_KEY` guard prevents the prior deterministic case where `validation_queued` was set before `run_cohort_validate()` returned `2` without creating a job. Delta: this improves the prior P8 finding, and the `latest_job is None and exit_code != 0` branch addresses the earlier no-job/non-zero-return path. However, the new recovery path is still not closed: on missing `OPENAI_API_KEY`, `approve()` writes `onboarding_state = "validation_failed"` and tells the operator to rerun `app.scripts.run_cohort_validate`, but `run…

Raccomandazione

Move application onboarding-state mirroring into the durable validation completion boundary, not only the `approve()` caller. When `_execute_with_job()` marks a job `completed`, `blocked`, `failed`, or `aborted`, update the corresponding `CohortApplication.onboarding_state` and failure reason in the same transaction, so direct recovery/retry invocations cannot complete a job while leaving the application in a stale blocker state.

P0

Richiede modificheHardening consigliato35/100

Optimise for steering, not only initiating

The workflow has some steering hooks but not a full steering surface. `CohortValidationJob.abort_requested` is checked in `mark_step_started()` and once before `validating`, and `retry_count` is tracked, but there is no service/API/CLI command in this submission to request abort, retry a failed job, requeue/resume work, or adjust constraints while the job is running. Cancellation also is not observed during the long `git clone` subprocess or the `validate_code_against_principles()` call. Delta: this maintains the prior P10 hardening gap; because the workflow has bounded steps, terminal states, and retryable failure records, this is not the current production blocker.

Raccomandazione

Expose a small steering boundary outside the execution loop: commands such as `request_abort(job_id)`, `retry_failed_job(job_id)`, and `requeue_blocked_job(job_id)` should update persisted job state, and clone/validation boundaries should observe cancellation before launching external work and after returning from it.

P0

Allineato

Design for delegation rather than direct manipulation

The workflow is structured around delegation of a cohort validation task rather than manual step execution: `approve()` captures the operator’s approval decision, then delegates repo scanning to `run_cohort_validate(app.id)`, which creates a durable `CohortValidationJob` via `create_job()`. Scope and constraints are explicit through `repo_url`, public GitHub URL validation, language detection, file selection, and the `ValidationRequest` built with repository/file context.

P0

Allineato

Ensure that background work remains perceptible

Background work is made perceptible through persistent job state: `CohortValidationJob.status` records `queued`, `cloning`, `selecting`, `bundling`, `validating`, `completed`, `blocked`, `failed`, and `aborted`, while `mark_step_started()`, `mark_completed()`, `mark_blocked()`, `mark_failed()`, and `mark_aborted()` commit timestamps and terminal state to the database. The normal approval path mirrors terminal validation results back to `CohortApplication.onboarding_state`.

P0

Allineato

Align feedback with the user’s level of attention

Feedback is calibrated by layer: routine progress is coarse-grained through job statuses and step timestamps, while intervention-worthy outcomes use typed `failure_kind`, `safe_display_message`, and `onboarding_failure_reason`. Deeper diagnostic detail is kept in the persisted audit object (`log_signals`, `usage_present`, `latency_ms`) rather than being forced into the primary operational state.

P0

Allineato

Apply progressive disclosure to system agency

The code separates summary state from detailed inspection. Primary state is represented by simple fields such as `onboarding_state`, `CohortValidationJob.status`, `failure_kind`, and `safe_display_message`; detailed evidence is available through the `audit_object` containing `source`, `selection`, `validate`, and `job` sections, including selected file hashes, bundle hash, commit SHA, skipped reads, and validator log summary.

P0

Allineato

Replace implied magic with clear mental models

The workflow exposes a clear mental model: `Step` is a literal state sequence (`cloning`, `selecting`, `bundling`, `validating`), terminal statuses are explicit in `TERMINAL_STATUSES`, and failure causes are enumerated in `FAILURE_KINDS`. The untrusted-code boundary is also made explicit through `BOUNDARY_HEADER`, `ENVELOPE_SCHEMA`, `BOUNDARY_CONTRACT_VERSION`, and `ENVELOPE_ADVISORY`, which tell the validator that bundled file contents are inert user-supplied code.

P0

Allineato

Expose meaningful operational state, not internal complexity

Operational state is expressed in user/operator-relevant terms rather than raw execution internals: `queued`, `blocked`, `failed`, `aborted`, `validation_complete`, and `validation_failed` are meaningful lifecycle states. Job failures use typed `failure_kind` values such as `invalid_repo_url`, `repo_too_large`, `selector_rejected`, `validate_error`, and `persist_error`, with concise `safe_display_message` values instead of stack traces.

P0

Allineato

Establish trust through inspectability

The result is inspectable through multiple load-bearing audit primitives. `build_file_envelope()` creates canonical JSON with `envelope_schema`, `boundary_contract`, file metadata, and a SHA-256 `envelope_hash`; `_build_implementation_context()` records per-file `content_sha256`; and `_execute_with_job()` persists an `audit_object` with `commit_sha`, selected files, skipped files, `bundle_sha256`, `envelope_hash`, latency, and validator log summary inside `UserValidationRun.result_json`.

P0

Allineato

Represent delegated work as a system, not merely as a conversation

Delegated validation is represented as a structured system rather than a conversation. The design has separate primitives for repository fetching (`fetch_public_repo()` / `RepoSnapshot`), surface selection (`select_agentic_surface()`), job lifecycle (`CohortValidationJob` plus state helpers), typed request construction (`ValidationRequest` / `ValidationContext`), and persisted audit/result storage (`UserValidationRun.result_json`).

Aggiungi al tuo README

Due varianti embeddabili: una piccola e una a card più ricca.

Score card (consigliata)

Blueprint Readiness Score card
[![Blueprint Readiness Score card](https://aidesignblueprint.com/api/badge/run/8364019d-268c-4b46-aacb-e929fadd0c36/card.svg)](https://aidesignblueprint.com/en/readiness-review/8364019d-268c-4b46-aacb-e929fadd0c36)

Badge piatto

Blueprint Readiness Score badge
[![Blueprint Readiness Score](https://aidesignblueprint.com/api/badge/run/8364019d-268c-4b46-aacb-e929fadd0c36.svg)](https://aidesignblueprint.com/en/readiness-review/8364019d-268c-4b46-aacb-e929fadd0c36)
Dettagli baseline e iterazione
Baseline: usedDoctrine: same doctrineRace: checked clear
Rubric: 2026-05-04Voto limitato da 0 finding ad alto rischio

Run ID: 8364019d-268c-4b46-aacb-e929fadd0c36 · Results expire after 90 days

Run by agents. Governed by humans. Validated by the AI Design Blueprint.