Assessment complete; awaiting evidence revision.
Evaluated 12 May 2026 against the AI Design Blueprint doctrine
Emerging
Status: High Risk
74/100
Grade C
The submission is an autonomous validation workflow with durable job rows, typed lifecycle states, explicit source/audit metadata, and a strong untrusted-code envelope. The prior silent `validation_queued`/no-job path is substantially improved by the new `OPENAI_API_KEY` guard and no-job/non-zero fallback, but one production blocker remains: the advertised recovery/manual validation path can complete a job without moving `CohortApplication.onboarding_state` out of `validation_failed`. Steering also remains limited, but that is a hardening gap rather than a production blocker.
Iteration history
5 prior runs on this artifact. Each run_id opens its own readiness review.
Per-principle findings
10 principles evaluated. Verdict, severity, evidence and recommendation for each.
P0
needs changesproduction blocker45/100Make hand-offs, approvals, and blockers explicit
The explicit approval and blocker model is mostly strong: `approve()` prompts unless `--yes`, job blockers are persisted with `mark_blocked()`/`mark_failed()`, and the new `OPENAI_API_KEY` guard prevents the prior deterministic case where `validation_queued` was set before `run_cohort_validate()` returned `2` without creating a job. Delta: this improves the prior P8 finding, and the `latest_job is None and exit_code != 0` branch addresses the earlier no-job/non-zero-return path. However, the new recovery path is still not closed: on missing `OPENAI_API_KEY`, `approve()` writes `onboarding_state = "validation_failed"` and tells the operator to rerun `app.scripts.run_cohort_validate`, but `run…
Recommendation
Move application onboarding-state mirroring into the durable validation completion boundary, not only the `approve()` caller. When `_execute_with_job()` marks a job `completed`, `blocked`, `failed`, or `aborted`, update the corresponding `CohortApplication.onboarding_state` and failure reason in the same transaction, so direct recovery/retry invocations cannot complete a job while leaving the application in a stale blocker state.
P0
needs changeshardening recommended35/100Optimise for steering, not only initiating
The workflow has some steering hooks but not a full steering surface. `CohortValidationJob.abort_requested` is checked in `mark_step_started()` and once before `validating`, and `retry_count` is tracked, but there is no service/API/CLI command in this submission to request abort, retry a failed job, requeue/resume work, or adjust constraints while the job is running. Cancellation also is not observed during the long `git clone` subprocess or the `validate_code_against_principles()` call. Delta: this maintains the prior P10 hardening gap; because the workflow has bounded steps, terminal states, and retryable failure records, this is not the current production blocker.
Recommendation
Expose a small steering boundary outside the execution loop: commands such as `request_abort(job_id)`, `retry_failed_job(job_id)`, and `requeue_blocked_job(job_id)` should update persisted job state, and clone/validation boundaries should observe cancellation before launching external work and after returning from it.
P0
alignedDesign for delegation rather than direct manipulation
The workflow is structured around delegation of a cohort validation task rather than manual step execution: `approve()` captures the operator’s approval decision, then delegates repo scanning to `run_cohort_validate(app.id)`, which creates a durable `CohortValidationJob` via `create_job()`. Scope and constraints are explicit through `repo_url`, public GitHub URL validation, language detection, file selection, and the `ValidationRequest` built with repository/file context.
P0
alignedEnsure that background work remains perceptible
Background work is made perceptible through persistent job state: `CohortValidationJob.status` records `queued`, `cloning`, `selecting`, `bundling`, `validating`, `completed`, `blocked`, `failed`, and `aborted`, while `mark_step_started()`, `mark_completed()`, `mark_blocked()`, `mark_failed()`, and `mark_aborted()` commit timestamps and terminal state to the database. The normal approval path mirrors terminal validation results back to `CohortApplication.onboarding_state`.
P0
alignedAlign feedback with the user’s level of attention
Feedback is calibrated by layer: routine progress is coarse-grained through job statuses and step timestamps, while intervention-worthy outcomes use typed `failure_kind`, `safe_display_message`, and `onboarding_failure_reason`. Deeper diagnostic detail is kept in the persisted audit object (`log_signals`, `usage_present`, `latency_ms`) rather than being forced into the primary operational state.
P0
alignedApply progressive disclosure to system agency
The code separates summary state from detailed inspection. Primary state is represented by simple fields such as `onboarding_state`, `CohortValidationJob.status`, `failure_kind`, and `safe_display_message`; detailed evidence is available through the `audit_object` containing `source`, `selection`, `validate`, and `job` sections, including selected file hashes, bundle hash, commit SHA, skipped reads, and validator log summary.
P0
alignedReplace implied magic with clear mental models
The workflow exposes a clear mental model: `Step` is a literal state sequence (`cloning`, `selecting`, `bundling`, `validating`), terminal statuses are explicit in `TERMINAL_STATUSES`, and failure causes are enumerated in `FAILURE_KINDS`. The untrusted-code boundary is also made explicit through `BOUNDARY_HEADER`, `ENVELOPE_SCHEMA`, `BOUNDARY_CONTRACT_VERSION`, and `ENVELOPE_ADVISORY`, which tell the validator that bundled file contents are inert user-supplied code.
P0
alignedExpose meaningful operational state, not internal complexity
Operational state is expressed in user/operator-relevant terms rather than raw execution internals: `queued`, `blocked`, `failed`, `aborted`, `validation_complete`, and `validation_failed` are meaningful lifecycle states. Job failures use typed `failure_kind` values such as `invalid_repo_url`, `repo_too_large`, `selector_rejected`, `validate_error`, and `persist_error`, with concise `safe_display_message` values instead of stack traces.
P0
alignedEstablish trust through inspectability
The result is inspectable through multiple load-bearing audit primitives. `build_file_envelope()` creates canonical JSON with `envelope_schema`, `boundary_contract`, file metadata, and a SHA-256 `envelope_hash`; `_build_implementation_context()` records per-file `content_sha256`; and `_execute_with_job()` persists an `audit_object` with `commit_sha`, selected files, skipped files, `bundle_sha256`, `envelope_hash`, latency, and validator log summary inside `UserValidationRun.result_json`.
P0
alignedRepresent delegated work as a system, not merely as a conversation
Delegated validation is represented as a structured system rather than a conversation. The design has separate primitives for repository fetching (`fetch_public_repo()` / `RepoSnapshot`), surface selection (`select_agentic_surface()`), job lifecycle (`CohortValidationJob` plus state helpers), typed request construction (`ValidationRequest` / `ValidationContext`), and persisted audit/result storage (`UserValidationRun.result_json`).
Embed in your README
Two embeddable variants: a small flat shield and a richer score card.
Score card (recommended)
[](https://aidesignblueprint.com/en/readiness-review/8364019d-268c-4b46-aacb-e929fadd0c36)
Flat badge
[](https://aidesignblueprint.com/en/readiness-review/8364019d-268c-4b46-aacb-e929fadd0c36)
Run ID: 8364019d-268c-4b46-aacb-e929fadd0c36 · Results expire after 90 days
Run by agents. Governed by humans. Validated by the AI Design Blueprint.