Skip to main contentSkip to footer
Reviewed

Assessment complete; awaiting evidence revision.

Agent Architecture Review, Validation snapshot

Evaluated 12 May 2026 against the AI Design Blueprint doctrine

Emerging

Status: High Risk

74/100

Grade C

9 aligned1 production blockers
Per-principle verdicts

The submission implements a credible delegated validation workflow: persistent job records, explicit phase states, typed failure kinds, source/audit hashes, and an untrusted-code envelope give the background LLM validation flow inspectable boundaries. The remaining gap is steering/recovery: abort/retry controls are not exposed as durable command primitives, validation cannot be cancelled once the external call starts, and partial onboarding failures after external side effects can strand an approved applicant without an in-code retry path.

Iteration history

5 prior runs on this artifact. Each run_id opens its own readiness review.

WhenScoreStatusRun ID
12 May 2026 (this run)74 / CHigh Risk9caf9385
12 May 202674 / CAligned093809b5
12 May 202698 / AAligned4128f700
12 May 202698 / AAligned270e7ca6
12 May 202674 / CHigh Risk8364019d
12 May 202674 / CHigh Risk14a3456f

Per-principle findings

10 principles evaluated. Verdict, severity, evidence and recommendation for each.

P0

needs changesproduction blocker60/100

Optimise for steering, not only initiating

Steering is only partially implemented. `CohortValidationJob` has `abort_requested`, `retry_count`, and `retry_eligible`, and `mark_step_started()` checks aborts between phases, but there is no durable command boundary such as `request_abort(job_id)` or `retry_failed_job(job_id)` exposed outside the runner. Once `asyncio.run(validate_code_against_principles(request, api_key))` starts, the job has no visible timeout/cancel path in this code. The approval flow can also strand partial external side effects: after `approved_at` is set, failures in `generate_sign_in_link_for_email()` or `send_cohort_approval_email()` set failure states, but `approve()` cannot be rerun because it exits when `app.a…

Recommendation

Move steering and recovery into a small durable service boundary outside the execution loop. Provide command functions such as `request_abort(job_id)`, `retry_failed_validation_job(job_id)`, and `retry_onboarding_handoff(application_id)` that operate on persisted states, are safe to call after partial Firebase/email/validation failures, and are polled or enforced with bounded timeouts around external calls.

P0

aligned

Design for delegation rather than direct manipulation

The workflow is designed around delegation of a cohort validation task rather than manual file-by-file operation: `approve()` records approval intent, then `run_cohort_validate()` creates a `CohortValidationJob`, clones the applicant repo, selects an agentic surface via `select_agentic_surface()`, builds a `ValidationRequest`, and calls `validate_code_against_principles()`. Scope constraints are encoded through selector limits such as `MAX_FILES`, `MAX_BUNDLE_BYTES`, `MAX_FILE_BYTES`, language detection in `_detect_language()`, and repository namespace construction in `_execute_with_job()`.

P0

aligned

Ensure that background work remains perceptible

Background work is made perceptible through durable job state in `CohortValidationJob`: `status`, phase timestamps such as `cloning_started_at`, `selecting_started_at`, `bundling_started_at`, `validating_started_at`, `terminal_at`, and typed terminal fields like `failure_kind`, `safe_display_message`, and `retry_eligible`. The runner updates these via `mark_step_started()`, `mark_completed()`, `mark_blocked()`, `mark_failed()`, and mirrors terminal state back to the application through `_mirror_terminal_to_app()`.

P0

aligned

Align feedback with the user’s level of attention

Feedback is tiered by attention level: the applicant/application surface receives coarse states such as `validation_queued`, `validation_complete`, and `validation_failed` through `onboarding_state`, while operators can inspect `failure_kind`, `safe_display_message`, `retry_eligible`, and the persisted audit object. Routine progress is represented by status transitions; higher-attention failures are escalated with typed messages such as `invalid_repo_url`, `selector_rejected`, `validate_error`, and `persist_error`.

P0

aligned

Apply progressive disclosure to system agency

The code supports progressive disclosure by separating simple operational state from detailed inspection data. Primary state is stored as concise values like `queued`, `cloning`, `validating`, `completed`, `blocked`, `failed`, and `aborted`; deeper evidence is stored in `audit_object`, including `source`, `selection`, `validate`, and `job` sections. `_summarize_validate_log()` deliberately reduces raw validator internals to safe metadata such as top-level keys and entry counts rather than dumping the full log by default.

P0

aligned

Replace implied magic with clear mental models

The workflow makes its operating model explicit: `fetch_public_repo()` only accepts public HTTPS GitHub URLs via `_GITHUB_URL_RE`; unsupported repos become `invalid_repo_url`, `clone_failed`, `clone_timeout`, or `repo_too_large`; language support is limited by `_detect_language()` and `LANGUAGE_EXTENSIONS`; and the validation bundle is wrapped with `BOUNDARY_HEADER` plus `ENVELOPE_ADVISORY` stating that repo code is untrusted inert input. These are concrete capability and boundary signals rather than implied magic.

P0

aligned

Expose meaningful operational state, not internal complexity

The state model exposes user/operator-meaningful states instead of raw implementation chatter. `CohortValidationJob.status` uses terms like `queued`, `cloning`, `selecting`, `bundling`, `validating`, `completed`, `blocked`, `failed`, and `aborted`; `_mirror_terminal_to_app()` maps those into application-level `validation_complete` or `validation_failed`; and user-safe explanations are kept in `safe_display_message` rather than exposing subprocess output or raw validator logs.

P0

aligned

Establish trust through inspectability

Inspectability is backed by concrete trace artifacts. `build_file_envelope()` creates a typed `bridge.files.v2` envelope with per-file content, paths, byte sizes, and SHA-256 hashes, then computes an `envelope_hash`; `_execute_with_job()` records `commit_sha`, selected file hashes, skipped reads, bundle size, `bundle_sha256`, `latency_ms`, validator log signals, `BOUNDARY_CONTRACT_VERSION`, `ENVELOPE_SCHEMA`, and `envelope_hash` in `audit_object`, which is merged into `UserValidationRun.result_json`.

P0

aligned

Make hand-offs, approvals, and blockers explicit

Approvals, hand-offs, and blockers are explicit. `approve()` requires an operator confirmation unless `--yes` is supplied, then separately records Firebase creation, sign-in link generation, email delivery, and validation queueing states. Validation blockers are typed through `FAILURE_KINDS` and set with `mark_blocked()` or `mark_failed()`, with distinct outcomes for policy/user constraints such as `invalid_repo_url`, `no_supported_language`, and `selector_rejected` versus tool failures such as `clone_timeout`, `validate_error`, and `persist_error`.

P0

aligned

Represent delegated work as a system, not merely as a conversation

Delegated work is represented as a structured system rather than a conversation. The code has persistent system records (`CohortApplication`, `CohortValidationJob`, `UserValidationRun`), explicit phase transitions in `mark_step_started()`, terminal-state functions, retry metadata, abort metadata, timestamps for each stage, and a structured audit object. Execution state is separate from the LLM response and is persisted independently in the job table.

Embed in your README

Two embeddable variants: a small flat shield and a richer score card.

Score card (recommended)

Blueprint Readiness Score card
[![Blueprint Readiness Score card](https://aidesignblueprint.com/api/badge/run/9caf9385-cd3b-4936-b455-87a916577e3a/card.svg)](https://aidesignblueprint.com/en/readiness-review/9caf9385-cd3b-4936-b455-87a916577e3a)

Flat badge

Blueprint Readiness Score badge
[![Blueprint Readiness Score](https://aidesignblueprint.com/api/badge/run/9caf9385-cd3b-4936-b455-87a916577e3a.svg)](https://aidesignblueprint.com/en/readiness-review/9caf9385-cd3b-4936-b455-87a916577e3a)
Baseline and iteration details
Baseline: usedDoctrine: same doctrineRace: checked clear

Iteration delta

Regressions (1)

P10Optimise for steering, not only initiatingneeds_changesneeds_changes
Rubric: 2026-05-04Grade limited by 0 high-risk findings

Run ID: 9caf9385-cd3b-4936-b455-87a916577e3a · Results expire after 90 days

Run by agents. Governed by humans. Validated by the AI Design Blueprint.