Reviewed

Assessment complete; awaiting evidence revision.

Agent Architecture Review, Validation snapshot

Evaluated 12 May 2026 against the AI Design Blueprint doctrine

High Risk

Status: High Risk

30/100

Grade F

3 aligned7 production blockers

Per-principle verdicts

This is a real multi-step agentic workflow: it approves an applicant, clones an external repo, selects files, calls an LLM validator, persists results, and sends externally visible email. The submitted code adds meaningful audit metadata, but production trust boundaries still fail around durable run lifecycle, explicit failure/blocker state, steering/recovery, and the raw handoff of untrusted repo content into validation.

Per-principle findings

10 principles evaluated. Verdict, severity, evidence and recommendation for each.

needs changesproduction blocker80/100

Make hand-offs, approvals, and blockers explicit

Some blockers are explicit now, including malformed owner/repo parsing routed to `_mark_no_agentic_surface(reason="invalid_repo_url")`, `SelectionRejected`, and typed `FetchError.kind`. But important handoff failures still stall silently: `run()` only catches `FetchError`, so validation, JSON serialization, DB flush/commit, and audit construction exceptions propagate; then `approve()` catches all exceptions around `run_cohort_validate(app.id)` and silently `pass`es. After Firebase user creation and email delivery, that leaves no durable blocker reason or required action.

Recommendation

Replace the broad silent catch in `approve()` with a typed failure transition on the validation job/application, and wrap the validation/persistence phase in `run()` so every non-success path records `blocked` or `failed` with a safe reason, retry instructions, and operator visibility.

needs changesproduction blocker70/100

Ensure that background work remains perceptible

Long-running validation is perceptible only through transient CLI `print()` calls and final DB fields. `approve()` sends the approval email and then runs `run_cohort_validate()` inline, while `run()` creates no durable `queued`/`active` run row before clone/selection/LLM validation; `UserValidationRun` is only added after the validator succeeds, and graceful failures only overload `app.rejection_reason`. If the process crashes or the LLM call hangs after the email is sent, the applicant can be left in a silent pending state.

Recommendation

Create a persistent validation job/run record before starting clone or validation, with durable states such as `queued`, `cloning`, `selecting_files`, `validating`, `blocked`, `failed`, and `complete`; have the approval path enqueue or hand off to that lifecycle rather than relying on inline CLI output.

needs changesproduction blocker70/100

Establish trust through inspectability

The audit envelope is a substantial improvement: `audit_object` records `schema_version`, `captured_at`, source `repo_url`/`commit_sha`, selected file hashes, skipped reads, `bundle_sha256`, validator latency, and shape-only log signals. However, the bridge still constructs the validator payload by concatenating raw untrusted repo content in `_build_implementation_context()` via `chunks.append(f"# === FILE: {path} ===\n{content}\n")` and then sends it as `ValidationRequest(code=bundle, ...)`. This file does not establish or record an inert untrusted-input boundary, so repo comments or filenames containing instruction-like text could steer the validator unless a hidden downstream service comp…

Recommendation

Make the untrusted-code boundary a validation-service primitive: pass files as structured quoted/JSON content with explicit inert semantics, sanitize or encode path headers, pin and record the prompt/boundary version in the audit object, and keep the audit ledger separate from mutable execution flow where possible.

needs changesproduction blocker65/100

Expose meaningful operational state, not internal complexity

Operational state is not represented in user-relevant durable terms. The code stores final success as `app.validation_run_id`, but blocked/failure states are encoded into `app.rejection_reason` strings like `bridge_exit:selector_rejected` and `fetch_failed:{kind}: ...`. There is no first-class status model for `active`, `blocked`, `failed`, `awaiting retry`, or `complete`, and no persisted state at all while clone/validate is running.

Recommendation

Move validation state out of overloaded `rejection_reason` strings into a structured job/run table or service-owned status object with user-relevant states, timestamps, failure kind, safe display message, and retry eligibility.

needs changesproduction blocker65/100

Represent delegated work as a system, not merely as a conversation

The delegated work is implemented as sequential script control flow rather than a durable system of tasks. The docstring lists eight phases, but the database only receives a final `UserValidationRun` after success or a coarse `rejection_reason` on selected graceful exits. Substeps such as clone, language detection, selection, file-read skipping, LLM validation, and persistence are not represented as resumable or inspectable task states while they are happening.

Recommendation

Represent the bridge as a service-owned validation job with explicit step records or status transitions for clone, inventory, selection, bundle build, validation, audit persist, and completion; the CLI should start or observe that system rather than being the system of record.

needs changesproduction blocker60/100

Align feedback with the user’s level of attention

Feedback is not calibrated to attention level. The active operator sees console messages such as `✓ Cloned`, `✗ Selector rejected`, and skipped-read warnings, but absent users receive only eventual dashboard effects. More seriously, `approve()` catches any exception from `run_cohort_validate(app.id)` and executes `pass`, so an intervention-worthy validation failure after account creation/email is neither logged nor persisted for the applicant or founder.

Recommendation

Route validation progress and failures through the durable run lifecycle: keep concise foreground CLI output, but persist intervention-worthy states and unexpected exceptions with an operator-visible failure reason and applicant-safe status.

needs changesproduction blocker60/100

Optimise for steering, not only initiating

The workflow supports initiation and manual rerun, but not steering while underway. Once `run_cohort_validate()` starts, there is no durable pause, abort, resume, retry-from-step, or constraint-update primitive; the only practical controls are Ctrl-C, waiting, or rerunning if `validation_run_id` was never set. The approval path also commits user creation before downstream email/validation phases, without a rollback or recovery state beyond comments about manual rerun.

Recommendation

Introduce explicit lifecycle controls on the validation job: cancellable queued/running states, retry from failed step, safe abort semantics after external actions, and a way to update selection/validation constraints before restarting the LLM call.

aligned

Design for delegation rather than direct manipulation

The workflow supports delegation rather than manual step execution: after founder confirmation in `approve()`, the system creates/links the user, sends the approval email, invokes `run_cohort_validate(app.id)`, clones the repo, selects the agentic surface with `select_agentic_surface`, calls `validate_code_against_principles`, and persists a `UserValidationRun`. Scope is bounded by checks in `run()` for `approved_at`, `user_id`, `repo_url`, and existing `validation_run_id`.

aligned

Apply progressive disclosure to system agency

The code separates summary from detail well. Runtime output prints compact status lines, while deeper inspection is stored in the schema-versioned `audit_object`. `_summarize_validate_log()` deliberately persists shape-only metadata such as `top_level_keys`, `entry_count`, and `type` rather than dumping prompt snippets or reasoning traces into `result_json`.

aligned

Replace implied magic with clear mental models

The implementation makes key capabilities and limits explicit in code paths and user-facing messages: `_detect_language()` only supports Python/TypeScript, `SelectionRejected` explains that include patterns are sorting signals rather than gates, `fetch_public_repo()` rejects non-public HTTPS GitHub URLs, and `run()` distinguishes `no_supported_language`, `selector_rejected`, `invalid_repo_url`, and typed `FetchError.kind` values. The selector constants also make bundle limits visible via `MAX_FILES`, `MAX_BUNDLE_BYTES`, `MAX_FILE_BYTES`, and `MIN_BUNDLE_BYTES`.

Embed in your README

Two embeddable variants: a small flat shield and a richer score card.

Score card (recommended)

[![Blueprint Readiness Score card](https://aidesignblueprint.com/api/badge/run/86b2c59d-0f04-4b30-a891-796c4276fead/card.svg)](https://aidesignblueprint.com/en/readiness-review/86b2c59d-0f04-4b30-a891-796c4276fead)

Flat badge

[![Blueprint Readiness Score](https://aidesignblueprint.com/api/badge/run/86b2c59d-0f04-4b30-a891-796c4276fead.svg)](https://aidesignblueprint.com/en/readiness-review/86b2c59d-0f04-4b30-a891-796c4276fead)

Baseline and iteration details

Rubric: 2026-05-04

Run your own validation AI Design Blueprint

Run ID: 86b2c59d-0f04-4b30-a891-796c4276fead · Results expire after 90 days

Run by agents. Governed by humans. Validated by the AI Design Blueprint.