Reviewed

Assessment complete; awaiting evidence revision.

Agent Architecture Review, Validation snapshot

Evaluated 12 May 2026 against the AI Design Blueprint doctrine

Emerging

Status: High Risk

74/100

Grade C

8 aligned1 production blockers1 hardening

Per-principle verdicts

The submission is an autonomous validation workflow with durable job rows, typed lifecycle states, explicit source/audit metadata, and a strong untrusted-code envelope. The prior silent `validation_queued`/no-job path is substantially improved by the new `OPENAI_API_KEY` guard and no-job/non-zero fallback, but one production blocker remains: the advertised recovery/manual validation path can complete a job without moving `CohortApplication.onboarding_state` out of `validation_failed`. Steering also remains limited, but that is a hardening gap rather than a production blocker.

Iteration history

5 prior runs on this artifact. Each run_id opens its own readiness review.

When	Score	Status	Run ID
12 May 2026 (this run)	74 / C	High Risk	8364019d…
12 May 2026	74 / C	High Risk	14a3456f…
12 May 2026	74 / C	High Risk	3f3bb587…
12 May 2026	74 / C	High Risk	659a695a…
12 May 2026	67 / C	High Risk	e476247c…
12 May 2026	30 / F	High Risk	86b2c59d…

Per-principle findings

10 principles evaluated. Verdict, severity, evidence and recommendation for each.

needs changesproduction blocker45/100

Make hand-offs, approvals, and blockers explicit

The explicit approval and blocker model is mostly strong: `approve()` prompts unless `--yes`, job blockers are persisted with `mark_blocked()`/`mark_failed()`, and the new `OPENAI_API_KEY` guard prevents the prior deterministic case where `validation_queued` was set before `run_cohort_validate()` returned `2` without creating a job. Delta: this improves the prior P8 finding, and the `latest_job is None and exit_code != 0` branch addresses the earlier no-job/non-zero-return path. However, the new recovery path is still not closed: on missing `OPENAI_API_KEY`, `approve()` writes `onboarding_state = "validation_failed"` and tells the operator to rerun `app.scripts.run_cohort_validate`, but `run…

Recommendation

Move application onboarding-state mirroring into the durable validation completion boundary, not only the `approve()` caller. When `_execute_with_job()` marks a job `completed`, `blocked`, `failed`, or `aborted`, update the corresponding `CohortApplication.onboarding_state` and failure reason in the same transaction, so direct recovery/retry invocations cannot complete a job while leaving the application in a stale blocker state.

needs changeshardening recommended35/100

Optimise for steering, not only initiating

The workflow has some steering hooks but not a full steering surface. `CohortValidationJob.abort_requested` is checked in `mark_step_started()` and once before `validating`, and `retry_count` is tracked, but there is no service/API/CLI command in this submission to request abort, retry a failed job, requeue/resume work, or adjust constraints while the job is running. Cancellation also is not observed during the long `git clone` subprocess or the `validate_code_against_principles()` call. Delta: this maintains the prior P10 hardening gap; because the workflow has bounded steps, terminal states, and retryable failure records, this is not the current production blocker.

Recommendation

Expose a small steering boundary outside the execution loop: commands such as `request_abort(job_id)`, `retry_failed_job(job_id)`, and `requeue_blocked_job(job_id)` should update persisted job state, and clone/validation boundaries should observe cancellation before launching external work and after returning from it.

aligned

Design for delegation rather than direct manipulation

The workflow is structured around delegation of a cohort validation task rather than manual step execution: `approve()` captures the operator’s approval decision, then delegates repo scanning to `run_cohort_validate(app.id)`, which creates a durable `CohortValidationJob` via `create_job()`. Scope and constraints are explicit through `repo_url`, public GitHub URL validation, language detection, file selection, and the `ValidationRequest` built with repository/file context.

aligned

Ensure that background work remains perceptible

Background work is made perceptible through persistent job state: `CohortValidationJob.status` records `queued`, `cloning`, `selecting`, `bundling`, `validating`, `completed`, `blocked`, `failed`, and `aborted`, while `mark_step_started()`, `mark_completed()`, `mark_blocked()`, `mark_failed()`, and `mark_aborted()` commit timestamps and terminal state to the database. The normal approval path mirrors terminal validation results back to `CohortApplication.onboarding_state`.

aligned

Align feedback with the user’s level of attention

Feedback is calibrated by layer: routine progress is coarse-grained through job statuses and step timestamps, while intervention-worthy outcomes use typed `failure_kind`, `safe_display_message`, and `onboarding_failure_reason`. Deeper diagnostic detail is kept in the persisted audit object (`log_signals`, `usage_present`, `latency_ms`) rather than being forced into the primary operational state.

aligned

Apply progressive disclosure to system agency

The code separates summary state from detailed inspection. Primary state is represented by simple fields such as `onboarding_state`, `CohortValidationJob.status`, `failure_kind`, and `safe_display_message`; detailed evidence is available through the `audit_object` containing `source`, `selection`, `validate`, and `job` sections, including selected file hashes, bundle hash, commit SHA, skipped reads, and validator log summary.

aligned

Replace implied magic with clear mental models

The workflow exposes a clear mental model: `Step` is a literal state sequence (`cloning`, `selecting`, `bundling`, `validating`), terminal statuses are explicit in `TERMINAL_STATUSES`, and failure causes are enumerated in `FAILURE_KINDS`. The untrusted-code boundary is also made explicit through `BOUNDARY_HEADER`, `ENVELOPE_SCHEMA`, `BOUNDARY_CONTRACT_VERSION`, and `ENVELOPE_ADVISORY`, which tell the validator that bundled file contents are inert user-supplied code.

aligned

Expose meaningful operational state, not internal complexity

Operational state is expressed in user/operator-relevant terms rather than raw execution internals: `queued`, `blocked`, `failed`, `aborted`, `validation_complete`, and `validation_failed` are meaningful lifecycle states. Job failures use typed `failure_kind` values such as `invalid_repo_url`, `repo_too_large`, `selector_rejected`, `validate_error`, and `persist_error`, with concise `safe_display_message` values instead of stack traces.

aligned

Establish trust through inspectability

The result is inspectable through multiple load-bearing audit primitives. `build_file_envelope()` creates canonical JSON with `envelope_schema`, `boundary_contract`, file metadata, and a SHA-256 `envelope_hash`; `_build_implementation_context()` records per-file `content_sha256`; and `_execute_with_job()` persists an `audit_object` with `commit_sha`, selected files, skipped files, `bundle_sha256`, `envelope_hash`, latency, and validator log summary inside `UserValidationRun.result_json`.

aligned

Represent delegated work as a system, not merely as a conversation

Delegated validation is represented as a structured system rather than a conversation. The design has separate primitives for repository fetching (`fetch_public_repo()` / `RepoSnapshot`), surface selection (`select_agentic_surface()`), job lifecycle (`CohortValidationJob` plus state helpers), typed request construction (`ValidationRequest` / `ValidationContext`), and persisted audit/result storage (`UserValidationRun.result_json`).

Embed in your README

Two embeddable variants: a small flat shield and a richer score card.

Score card (recommended)

[![Blueprint Readiness Score card](https://aidesignblueprint.com/api/badge/run/8364019d-268c-4b46-aacb-e929fadd0c36/card.svg)](https://aidesignblueprint.com/en/readiness-review/8364019d-268c-4b46-aacb-e929fadd0c36)

Flat badge

[![Blueprint Readiness Score](https://aidesignblueprint.com/api/badge/run/8364019d-268c-4b46-aacb-e929fadd0c36.svg)](https://aidesignblueprint.com/en/readiness-review/8364019d-268c-4b46-aacb-e929fadd0c36)

Baseline and iteration details

Baseline: usedDoctrine: same doctrineRace: checked clear

Rubric: 2026-05-04Grade limited by 0 high-risk findings

Run your own validation AI Design Blueprint

Run ID: 8364019d-268c-4b46-aacb-e929fadd0c36 · Results expire after 90 days

Run by agents. Governed by humans. Validated by the AI Design Blueprint.