Skip to main contentSkip to footer
Reviewed

Assessment complete; awaiting evidence revision.

Agent Architecture Review, Validation snapshot

Evaluated 12 May 2026 against the AI Design Blueprint doctrine

Emerging

Status: High Risk

74/100

Grade C

8 aligned1 production blockers1 hardening
Per-principle verdicts

The submission is an autonomous validation workflow with durable job rows, typed lifecycle states, explicit source/audit metadata, and a strong untrusted-code envelope. The prior silent `validation_queued`/no-job path is substantially improved by the new `OPENAI_API_KEY` guard and no-job/non-zero fallback, but one production blocker remains: the advertised recovery/manual validation path can complete a job without moving `CohortApplication.onboarding_state` out of `validation_failed`. Steering also remains limited, but that is a hardening gap rather than a production blocker.

Iteration history

5 prior runs on this artifact. Each run_id opens its own readiness review.

WhenScoreStatusRun ID
12 May 2026 (this run)74 / CHigh Risk8364019d
12 May 202674 / CHigh Risk14a3456f
12 May 202674 / CHigh Risk3f3bb587
12 May 202674 / CHigh Risk659a695a
12 May 202667 / CHigh Riske476247c
12 May 202630 / FHigh Risk86b2c59d

Per-principle findings

10 principles evaluated. Verdict, severity, evidence and recommendation for each.

P0

needs changesproduction blocker45/100

Make hand-offs, approvals, and blockers explicit

The explicit approval and blocker model is mostly strong: `approve()` prompts unless `--yes`, job blockers are persisted with `mark_blocked()`/`mark_failed()`, and the new `OPENAI_API_KEY` guard prevents the prior deterministic case where `validation_queued` was set before `run_cohort_validate()` returned `2` without creating a job. Delta: this improves the prior P8 finding, and the `latest_job is None and exit_code != 0` branch addresses the earlier no-job/non-zero-return path. However, the new recovery path is still not closed: on missing `OPENAI_API_KEY`, `approve()` writes `onboarding_state = "validation_failed"` and tells the operator to rerun `app.scripts.run_cohort_validate`, but `run…

Recommendation

Move application onboarding-state mirroring into the durable validation completion boundary, not only the `approve()` caller. When `_execute_with_job()` marks a job `completed`, `blocked`, `failed`, or `aborted`, update the corresponding `CohortApplication.onboarding_state` and failure reason in the same transaction, so direct recovery/retry invocations cannot complete a job while leaving the application in a stale blocker state.

P0

needs changeshardening recommended35/100

Optimise for steering, not only initiating

The workflow has some steering hooks but not a full steering surface. `CohortValidationJob.abort_requested` is checked in `mark_step_started()` and once before `validating`, and `retry_count` is tracked, but there is no service/API/CLI command in this submission to request abort, retry a failed job, requeue/resume work, or adjust constraints while the job is running. Cancellation also is not observed during the long `git clone` subprocess or the `validate_code_against_principles()` call. Delta: this maintains the prior P10 hardening gap; because the workflow has bounded steps, terminal states, and retryable failure records, this is not the current production blocker.

Recommendation

Expose a small steering boundary outside the execution loop: commands such as `request_abort(job_id)`, `retry_failed_job(job_id)`, and `requeue_blocked_job(job_id)` should update persisted job state, and clone/validation boundaries should observe cancellation before launching external work and after returning from it.

P0

aligned

Design for delegation rather than direct manipulation

The workflow is structured around delegation of a cohort validation task rather than manual step execution: `approve()` captures the operator’s approval decision, then delegates repo scanning to `run_cohort_validate(app.id)`, which creates a durable `CohortValidationJob` via `create_job()`. Scope and constraints are explicit through `repo_url`, public GitHub URL validation, language detection, file selection, and the `ValidationRequest` built with repository/file context.

P0

aligned

Ensure that background work remains perceptible

Background work is made perceptible through persistent job state: `CohortValidationJob.status` records `queued`, `cloning`, `selecting`, `bundling`, `validating`, `completed`, `blocked`, `failed`, and `aborted`, while `mark_step_started()`, `mark_completed()`, `mark_blocked()`, `mark_failed()`, and `mark_aborted()` commit timestamps and terminal state to the database. The normal approval path mirrors terminal validation results back to `CohortApplication.onboarding_state`.

P0

aligned

Align feedback with the user’s level of attention

Feedback is calibrated by layer: routine progress is coarse-grained through job statuses and step timestamps, while intervention-worthy outcomes use typed `failure_kind`, `safe_display_message`, and `onboarding_failure_reason`. Deeper diagnostic detail is kept in the persisted audit object (`log_signals`, `usage_present`, `latency_ms`) rather than being forced into the primary operational state.

P0

aligned

Apply progressive disclosure to system agency

The code separates summary state from detailed inspection. Primary state is represented by simple fields such as `onboarding_state`, `CohortValidationJob.status`, `failure_kind`, and `safe_display_message`; detailed evidence is available through the `audit_object` containing `source`, `selection`, `validate`, and `job` sections, including selected file hashes, bundle hash, commit SHA, skipped reads, and validator log summary.

P0

aligned

Replace implied magic with clear mental models

The workflow exposes a clear mental model: `Step` is a literal state sequence (`cloning`, `selecting`, `bundling`, `validating`), terminal statuses are explicit in `TERMINAL_STATUSES`, and failure causes are enumerated in `FAILURE_KINDS`. The untrusted-code boundary is also made explicit through `BOUNDARY_HEADER`, `ENVELOPE_SCHEMA`, `BOUNDARY_CONTRACT_VERSION`, and `ENVELOPE_ADVISORY`, which tell the validator that bundled file contents are inert user-supplied code.

P0

aligned

Expose meaningful operational state, not internal complexity

Operational state is expressed in user/operator-relevant terms rather than raw execution internals: `queued`, `blocked`, `failed`, `aborted`, `validation_complete`, and `validation_failed` are meaningful lifecycle states. Job failures use typed `failure_kind` values such as `invalid_repo_url`, `repo_too_large`, `selector_rejected`, `validate_error`, and `persist_error`, with concise `safe_display_message` values instead of stack traces.

P0

aligned

Establish trust through inspectability

The result is inspectable through multiple load-bearing audit primitives. `build_file_envelope()` creates canonical JSON with `envelope_schema`, `boundary_contract`, file metadata, and a SHA-256 `envelope_hash`; `_build_implementation_context()` records per-file `content_sha256`; and `_execute_with_job()` persists an `audit_object` with `commit_sha`, selected files, skipped files, `bundle_sha256`, `envelope_hash`, latency, and validator log summary inside `UserValidationRun.result_json`.

P0

aligned

Represent delegated work as a system, not merely as a conversation

Delegated validation is represented as a structured system rather than a conversation. The design has separate primitives for repository fetching (`fetch_public_repo()` / `RepoSnapshot`), surface selection (`select_agentic_surface()`), job lifecycle (`CohortValidationJob` plus state helpers), typed request construction (`ValidationRequest` / `ValidationContext`), and persisted audit/result storage (`UserValidationRun.result_json`).

Embed in your README

Two embeddable variants: a small flat shield and a richer score card.

Score card (recommended)

Blueprint Readiness Score card
[![Blueprint Readiness Score card](https://aidesignblueprint.com/api/badge/run/8364019d-268c-4b46-aacb-e929fadd0c36/card.svg)](https://aidesignblueprint.com/en/readiness-review/8364019d-268c-4b46-aacb-e929fadd0c36)

Flat badge

Blueprint Readiness Score badge
[![Blueprint Readiness Score](https://aidesignblueprint.com/api/badge/run/8364019d-268c-4b46-aacb-e929fadd0c36.svg)](https://aidesignblueprint.com/en/readiness-review/8364019d-268c-4b46-aacb-e929fadd0c36)
Baseline and iteration details
Baseline: usedDoctrine: same doctrineRace: checked clear
Rubric: 2026-05-04Grade limited by 0 high-risk findings

Run ID: 8364019d-268c-4b46-aacb-e929fadd0c36 · Results expire after 90 days

Run by agents. Governed by humans. Validated by the AI Design Blueprint.