Valutazione completata; in attesa di revisione delle prove.
Valutato il 12 maggio 2026 rispetto alla doctrine di AI Design Blueprint
Emergente
Stato: Alto rischio
74/100
Voto C
The submission is an autonomous validation workflow with durable job rows, typed lifecycle states, explicit source/audit metadata, and a strong untrusted-code envelope. The prior silent `validation_queued`/no-job path is substantially improved by the new `OPENAI_API_KEY` guard and no-job/non-zero fallback, but one production blocker remains: the advertised recovery/manual validation path can complete a job without moving `CohortApplication.onboarding_state` out of `validation_failed`. Steering also remains limited, but that is a hardening gap rather than a production blocker.
Storico iterazioni
5 run precedenti su questo artefatto. Ogni run_id apre la sua readiness review.
Findings per principio
10 principi valutati. Verdict, severity, evidenza e raccomandazione per ognuno.
P0
Richiede modificheBlocker di produzione45/100Make hand-offs, approvals, and blockers explicit
The explicit approval and blocker model is mostly strong: `approve()` prompts unless `--yes`, job blockers are persisted with `mark_blocked()`/`mark_failed()`, and the new `OPENAI_API_KEY` guard prevents the prior deterministic case where `validation_queued` was set before `run_cohort_validate()` returned `2` without creating a job. Delta: this improves the prior P8 finding, and the `latest_job is None and exit_code != 0` branch addresses the earlier no-job/non-zero-return path. However, the new recovery path is still not closed: on missing `OPENAI_API_KEY`, `approve()` writes `onboarding_state = "validation_failed"` and tells the operator to rerun `app.scripts.run_cohort_validate`, but `run…
Raccomandazione
Move application onboarding-state mirroring into the durable validation completion boundary, not only the `approve()` caller. When `_execute_with_job()` marks a job `completed`, `blocked`, `failed`, or `aborted`, update the corresponding `CohortApplication.onboarding_state` and failure reason in the same transaction, so direct recovery/retry invocations cannot complete a job while leaving the application in a stale blocker state.
P0
Richiede modificheHardening consigliato35/100Optimise for steering, not only initiating
The workflow has some steering hooks but not a full steering surface. `CohortValidationJob.abort_requested` is checked in `mark_step_started()` and once before `validating`, and `retry_count` is tracked, but there is no service/API/CLI command in this submission to request abort, retry a failed job, requeue/resume work, or adjust constraints while the job is running. Cancellation also is not observed during the long `git clone` subprocess or the `validate_code_against_principles()` call. Delta: this maintains the prior P10 hardening gap; because the workflow has bounded steps, terminal states, and retryable failure records, this is not the current production blocker.
Raccomandazione
Expose a small steering boundary outside the execution loop: commands such as `request_abort(job_id)`, `retry_failed_job(job_id)`, and `requeue_blocked_job(job_id)` should update persisted job state, and clone/validation boundaries should observe cancellation before launching external work and after returning from it.
P0
AllineatoDesign for delegation rather than direct manipulation
The workflow is structured around delegation of a cohort validation task rather than manual step execution: `approve()` captures the operator’s approval decision, then delegates repo scanning to `run_cohort_validate(app.id)`, which creates a durable `CohortValidationJob` via `create_job()`. Scope and constraints are explicit through `repo_url`, public GitHub URL validation, language detection, file selection, and the `ValidationRequest` built with repository/file context.
P0
AllineatoEnsure that background work remains perceptible
Background work is made perceptible through persistent job state: `CohortValidationJob.status` records `queued`, `cloning`, `selecting`, `bundling`, `validating`, `completed`, `blocked`, `failed`, and `aborted`, while `mark_step_started()`, `mark_completed()`, `mark_blocked()`, `mark_failed()`, and `mark_aborted()` commit timestamps and terminal state to the database. The normal approval path mirrors terminal validation results back to `CohortApplication.onboarding_state`.
P0
AllineatoAlign feedback with the user’s level of attention
Feedback is calibrated by layer: routine progress is coarse-grained through job statuses and step timestamps, while intervention-worthy outcomes use typed `failure_kind`, `safe_display_message`, and `onboarding_failure_reason`. Deeper diagnostic detail is kept in the persisted audit object (`log_signals`, `usage_present`, `latency_ms`) rather than being forced into the primary operational state.
P0
AllineatoApply progressive disclosure to system agency
The code separates summary state from detailed inspection. Primary state is represented by simple fields such as `onboarding_state`, `CohortValidationJob.status`, `failure_kind`, and `safe_display_message`; detailed evidence is available through the `audit_object` containing `source`, `selection`, `validate`, and `job` sections, including selected file hashes, bundle hash, commit SHA, skipped reads, and validator log summary.
P0
AllineatoReplace implied magic with clear mental models
The workflow exposes a clear mental model: `Step` is a literal state sequence (`cloning`, `selecting`, `bundling`, `validating`), terminal statuses are explicit in `TERMINAL_STATUSES`, and failure causes are enumerated in `FAILURE_KINDS`. The untrusted-code boundary is also made explicit through `BOUNDARY_HEADER`, `ENVELOPE_SCHEMA`, `BOUNDARY_CONTRACT_VERSION`, and `ENVELOPE_ADVISORY`, which tell the validator that bundled file contents are inert user-supplied code.
P0
AllineatoExpose meaningful operational state, not internal complexity
Operational state is expressed in user/operator-relevant terms rather than raw execution internals: `queued`, `blocked`, `failed`, `aborted`, `validation_complete`, and `validation_failed` are meaningful lifecycle states. Job failures use typed `failure_kind` values such as `invalid_repo_url`, `repo_too_large`, `selector_rejected`, `validate_error`, and `persist_error`, with concise `safe_display_message` values instead of stack traces.
P0
AllineatoEstablish trust through inspectability
The result is inspectable through multiple load-bearing audit primitives. `build_file_envelope()` creates canonical JSON with `envelope_schema`, `boundary_contract`, file metadata, and a SHA-256 `envelope_hash`; `_build_implementation_context()` records per-file `content_sha256`; and `_execute_with_job()` persists an `audit_object` with `commit_sha`, selected files, skipped files, `bundle_sha256`, `envelope_hash`, latency, and validator log summary inside `UserValidationRun.result_json`.
P0
AllineatoRepresent delegated work as a system, not merely as a conversation
Delegated validation is represented as a structured system rather than a conversation. The design has separate primitives for repository fetching (`fetch_public_repo()` / `RepoSnapshot`), surface selection (`select_agentic_surface()`), job lifecycle (`CohortValidationJob` plus state helpers), typed request construction (`ValidationRequest` / `ValidationContext`), and persisted audit/result storage (`UserValidationRun.result_json`).
Aggiungi al tuo README
Due varianti embeddabili: una piccola e una a card più ricca.
Score card (consigliata)
[](https://aidesignblueprint.com/en/readiness-review/8364019d-268c-4b46-aacb-e929fadd0c36)
Badge piatto
[](https://aidesignblueprint.com/en/readiness-review/8364019d-268c-4b46-aacb-e929fadd0c36)
Run ID: 8364019d-268c-4b46-aacb-e929fadd0c36 · Results expire after 90 days
Run by agents. Governed by humans. Validated by the AI Design Blueprint.