Reviewed

Assessment complete; awaiting evidence revision.

Agent Architecture Review, Validation snapshot

Evaluated 12 May 2026 against the AI Design Blueprint doctrine

Emerging

Status: High Risk

74/100

Grade C

7 aligned2 production blockers1 hardening

Per-principle verdicts

This is a real delegated validation workflow with durable job state, typed transitions, audit metadata, and improved rollback/failure handling. The remaining production blockers are at the trust-boundary edges: arbitrary repository text is still bundled into an instruction-bearing validation payload with only an in-band warning header, and the approval/email handoff can fail after approval is committed without a durable blocker state.

Iteration history

2 prior runs on this artifact. Each run_id opens its own readiness review.

When	Score	Status	Run ID
12 May 2026 (this run)	74 / C	High Risk	659a695a…
12 May 2026	67 / C	High Risk	e476247c…
12 May 2026	30 / F	High Risk	86b2c59d…

Per-principle findings

10 principles evaluated. Verdict, severity, evidence and recommendation for each.

needs changesproduction blocker65/100

Establish trust through inspectability

The audit trail is strong in several places (`commit_sha`, per-file `content_sha256`, `bundle_sha256`, selected/skipped files, latency, job transitions), but the key trust boundary for arbitrary public repository content is still in-band. `_build_implementation_context()` concatenates raw file contents into strings like `# === FILE: {path} ===\n{content}`, `wrap_bundle_with_boundary(raw_bundle)` prepends a warning header, and `ValidationRequest(code=bundle, ...)` sends that monolithic text to the validator. A repository can still contain instruction-shaped text inside comments/docstrings/strings; the boundary is a prompt convention rather than a typed artifact boundary enforced by `validatio…

Recommendation

Move the inertness boundary into the validation service as a typed file-envelope primitive: pass files as structured data such as `{path, byte_size, sha256, content}` with an envelope version/hash, render source only in a non-instruction-bearing section controlled by the service, and persist that envelope hash alongside the result. Do not rely on an in-band `BOUNDARY_HEADER` inside the same raw prompt text as the untrusted code.

needs changesproduction blocker55/100

Make hand-offs, approvals, and blockers explicit

The validation job handoff is much better: validate and persistence exceptions now call `db.rollback()` before `mark_failed()`, the outer handler rolls back and refreshes `job`, and terminal blockers use typed `failure_kind` values. However, the approval/onboarding handoff can still strand a real applicant outside that durable job model: `approve()` commits `app.approved_at` and `app.user_id`, then `generate_sign_in_link_for_email()` catches `Exception` and simply `return 3` without persisting an application/job blocker; `send_cohort_approval_email()` is not wrapped in a durable failure transition either. Delta: the prior rollback subfinding is improved, but a production handoff gap remains…

Recommendation

Move approval-side external handoffs into a durable service-owned state machine, or extend `CohortApplication`/a dedicated onboarding job with typed states such as `approved`, `sign_in_link_failed`, `approval_email_failed`, and `validation_queued`. Before returning from sign-in-link or email failures, persist a safe blocker message and retry eligibility so the operator is not left with only an exit code and an already-approved applicant.

needs changeshardening recommended35/100

Optimise for steering, not only initiating

The code has a real steering primitive in `CohortValidationJob.abort_requested`, `mark_step_started()` checks it before advancing, the validate step now calls `db.refresh(job)` before checking abort, and `run()` calculates a durable `retry_count` from prior jobs. That improves the prior P10 refresh/retry-count gap. The submitted code still does not expose a service-owned steering surface that sets `abort_requested`, retries a failed job, or marks an interrupted job; comments say a separate surface flips the flag, but no `request_abort`/`retry_failed_job` API or CLI is present in the reviewed code. This is a remaining steering gap, though the underlying DB flag/check primitive means the core…

Recommendation

Add a small service-owned steering command/API outside the execution loop for `request_abort(job_id)`, `retry_failed_job(job_id)`, and `mark_interrupted(job_id)`, and have the orchestrator refresh before each external or costly step. Keep it as a thin state-transition primitive rather than adding a large orchestration wrapper.

aligned

Design for delegation rather than direct manipulation

The user/operator delegates the scan at the application level: `approve()` records approval, then `run_cohort_validate(app.id)` creates a `CohortValidationJob`; `_execute_with_job()` owns clone, language detection, file selection, bundling, validation, persistence, and terminal state. Scope and constraints are represented through `repo_url`, `namespace = f"cohort/{application_id}/{owner}/{repo_name}"`, `ValidationContext(task=..., repository=..., files=...)`, and the selected file list rather than requiring the operator to execute each step manually.

aligned

Ensure that background work remains perceptible

Background work is made perceptible through a persistent `CohortValidationJob` row created before work begins in `create_job()`, with durable statuses and timestamps for `queued`, `cloning`, `selecting`, `bundling`, `validating`, and terminal states. `mark_step_started()`, `mark_completed()`, `mark_blocked()`, `mark_failed()`, and `mark_aborted()` commit state transitions, and the approval CLI prints the latest terminal job status/failure fields after validation returns.

aligned

Align feedback with the user’s level of attention

The code separates low-noise routine progress from attention-demanding conditions: normal progress is persisted as status/timestamps, while blocked/failed states carry `failure_kind`, `safe_display_message`, and `retry_eligible`. `approve()` prints terminal status and failure details only after the run, while deeper validation telemetry is summarized via `_summarize_validate_log()` inside the audit object rather than streamed noisily to the operator.

aligned

Apply progressive disclosure to system agency

Primary state is simple (`status`, `failure_kind`, `safe_display_message`, `retry_eligible`), while detailed inspection is available in the persisted `audit_object` containing `source`, `selection`, `validate`, and `job` sections. The code records file hashes, selected paths, skipped reads, latency, usage presence, and `_collect_job_transitions(job)` without forcing those internals into the main terminal message.

aligned

Replace implied magic with clear mental models

The workflow exposes a clear mental model through explicit state-machine comments in `cohort_validation_job.py` and `CohortValidationJob`, bounded repository support in `_looks_like_public_github_https()`, language detection in `_detect_language()`, selector rejection via `SelectionRejected`, and a visible untrusted-input boundary constant `BOUNDARY_HEADER`. Failure messages distinguish unsupported language, invalid repository URL, clone timeout/failure, repository size, read failure, validator failure, persistence failure, and abort.

aligned

Expose meaningful operational state, not internal complexity

Operational state is represented in user/action-relevant terms: `TERMINAL_STATUSES`, `FAILURE_KINDS`, `mark_blocked()`, `mark_failed()`, and `safe_display_message` avoid exposing raw stack traces as the main status surface. The current code also addresses the prior P6 gap: `FetchError.kind == "invalid_url"` is explicitly mapped to `"invalid_repo_url"` in `kind_map`, and an all-read-failed bundle now blocks with `failure_kind="read_failed"` instead of validating an empty/materially incomplete bundle.

aligned

Represent delegated work as a system, not merely as a conversation

Delegated work is represented as a structured system rather than a conversation: `CohortValidationJob` stores lifecycle state, timestamps, abort/retry fields, terminal failure fields, validation-run linkage, commit/bundle hashes, and selected/skipped counts. `_execute_with_job()` follows explicit steps, while the persisted `audit_object` captures source, selection, validation, and job-transition substructures.

Embed in your README

Two embeddable variants: a small flat shield and a richer score card.

Score card (recommended)

[![Blueprint Readiness Score card](https://aidesignblueprint.com/api/badge/run/659a695a-2bf6-4b28-a52f-b6098c20ee0b/card.svg)](https://aidesignblueprint.com/en/readiness-review/659a695a-2bf6-4b28-a52f-b6098c20ee0b)

Flat badge

[![Blueprint Readiness Score](https://aidesignblueprint.com/api/badge/run/659a695a-2bf6-4b28-a52f-b6098c20ee0b.svg)](https://aidesignblueprint.com/en/readiness-review/659a695a-2bf6-4b28-a52f-b6098c20ee0b)

Baseline and iteration details

Baseline: usedDoctrine: same doctrineRace: checked clear

Iteration delta

Improvements (3)

P6Expose meaningful operational state, not internal complexityneeds_changesaligned

P8Make hand-offs, approvals, and blockers explicitneeds_changesneeds_changes

P10Optimise for steering, not only initiatingneeds_changesneeds_changes

Rubric: 2026-05-04Grade limited by 0 high-risk findings

Run your own validation AI Design Blueprint

Run ID: 659a695a-2bf6-4b28-a52f-b6098c20ee0b · Results expire after 90 days

Run by agents. Governed by humans. Validated by the AI Design Blueprint.