Assessment complete; awaiting evidence revision.
Evaluated 12 May 2026 against the AI Design Blueprint doctrine
High Risk
Status: High Risk
30/100
Grade F
This is a real multi-step agentic workflow: it approves an applicant, clones an external repo, selects files, calls an LLM validator, persists results, and sends externally visible email. The submitted code adds meaningful audit metadata, but production trust boundaries still fail around durable run lifecycle, explicit failure/blocker state, steering/recovery, and the raw handoff of untrusted repo content into validation.
Per-principle findings
10 principles evaluated. Verdict, severity, evidence and recommendation for each.
P0
needs changesproduction blocker80/100Make hand-offs, approvals, and blockers explicit
Some blockers are explicit now, including malformed owner/repo parsing routed to `_mark_no_agentic_surface(reason="invalid_repo_url")`, `SelectionRejected`, and typed `FetchError.kind`. But important handoff failures still stall silently: `run()` only catches `FetchError`, so validation, JSON serialization, DB flush/commit, and audit construction exceptions propagate; then `approve()` catches all exceptions around `run_cohort_validate(app.id)` and silently `pass`es. After Firebase user creation and email delivery, that leaves no durable blocker reason or required action.
Recommendation
Replace the broad silent catch in `approve()` with a typed failure transition on the validation job/application, and wrap the validation/persistence phase in `run()` so every non-success path records `blocked` or `failed` with a safe reason, retry instructions, and operator visibility.
P0
needs changesproduction blocker70/100Ensure that background work remains perceptible
Long-running validation is perceptible only through transient CLI `print()` calls and final DB fields. `approve()` sends the approval email and then runs `run_cohort_validate()` inline, while `run()` creates no durable `queued`/`active` run row before clone/selection/LLM validation; `UserValidationRun` is only added after the validator succeeds, and graceful failures only overload `app.rejection_reason`. If the process crashes or the LLM call hangs after the email is sent, the applicant can be left in a silent pending state.
Recommendation
Create a persistent validation job/run record before starting clone or validation, with durable states such as `queued`, `cloning`, `selecting_files`, `validating`, `blocked`, `failed`, and `complete`; have the approval path enqueue or hand off to that lifecycle rather than relying on inline CLI output.
P0
needs changesproduction blocker70/100Establish trust through inspectability
The audit envelope is a substantial improvement: `audit_object` records `schema_version`, `captured_at`, source `repo_url`/`commit_sha`, selected file hashes, skipped reads, `bundle_sha256`, validator latency, and shape-only log signals. However, the bridge still constructs the validator payload by concatenating raw untrusted repo content in `_build_implementation_context()` via `chunks.append(f"# === FILE: {path} ===\n{content}\n")` and then sends it as `ValidationRequest(code=bundle, ...)`. This file does not establish or record an inert untrusted-input boundary, so repo comments or filenames containing instruction-like text could steer the validator unless a hidden downstream service comp…
Recommendation
Make the untrusted-code boundary a validation-service primitive: pass files as structured quoted/JSON content with explicit inert semantics, sanitize or encode path headers, pin and record the prompt/boundary version in the audit object, and keep the audit ledger separate from mutable execution flow where possible.
P0
needs changesproduction blocker65/100Expose meaningful operational state, not internal complexity
Operational state is not represented in user-relevant durable terms. The code stores final success as `app.validation_run_id`, but blocked/failure states are encoded into `app.rejection_reason` strings like `bridge_exit:selector_rejected` and `fetch_failed:{kind}: ...`. There is no first-class status model for `active`, `blocked`, `failed`, `awaiting retry`, or `complete`, and no persisted state at all while clone/validate is running.
Recommendation
Move validation state out of overloaded `rejection_reason` strings into a structured job/run table or service-owned status object with user-relevant states, timestamps, failure kind, safe display message, and retry eligibility.
P0
needs changesproduction blocker65/100Represent delegated work as a system, not merely as a conversation
The delegated work is implemented as sequential script control flow rather than a durable system of tasks. The docstring lists eight phases, but the database only receives a final `UserValidationRun` after success or a coarse `rejection_reason` on selected graceful exits. Substeps such as clone, language detection, selection, file-read skipping, LLM validation, and persistence are not represented as resumable or inspectable task states while they are happening.
Recommendation
Represent the bridge as a service-owned validation job with explicit step records or status transitions for clone, inventory, selection, bundle build, validation, audit persist, and completion; the CLI should start or observe that system rather than being the system of record.
P0
needs changesproduction blocker60/100Align feedback with the user’s level of attention
Feedback is not calibrated to attention level. The active operator sees console messages such as `✓ Cloned`, `✗ Selector rejected`, and skipped-read warnings, but absent users receive only eventual dashboard effects. More seriously, `approve()` catches any exception from `run_cohort_validate(app.id)` and executes `pass`, so an intervention-worthy validation failure after account creation/email is neither logged nor persisted for the applicant or founder.
Recommendation
Route validation progress and failures through the durable run lifecycle: keep concise foreground CLI output, but persist intervention-worthy states and unexpected exceptions with an operator-visible failure reason and applicant-safe status.
P0
needs changesproduction blocker60/100Optimise for steering, not only initiating
The workflow supports initiation and manual rerun, but not steering while underway. Once `run_cohort_validate()` starts, there is no durable pause, abort, resume, retry-from-step, or constraint-update primitive; the only practical controls are Ctrl-C, waiting, or rerunning if `validation_run_id` was never set. The approval path also commits user creation before downstream email/validation phases, without a rollback or recovery state beyond comments about manual rerun.
Recommendation
Introduce explicit lifecycle controls on the validation job: cancellable queued/running states, retry from failed step, safe abort semantics after external actions, and a way to update selection/validation constraints before restarting the LLM call.
P0
alignedDesign for delegation rather than direct manipulation
The workflow supports delegation rather than manual step execution: after founder confirmation in `approve()`, the system creates/links the user, sends the approval email, invokes `run_cohort_validate(app.id)`, clones the repo, selects the agentic surface with `select_agentic_surface`, calls `validate_code_against_principles`, and persists a `UserValidationRun`. Scope is bounded by checks in `run()` for `approved_at`, `user_id`, `repo_url`, and existing `validation_run_id`.
P0
alignedApply progressive disclosure to system agency
The code separates summary from detail well. Runtime output prints compact status lines, while deeper inspection is stored in the schema-versioned `audit_object`. `_summarize_validate_log()` deliberately persists shape-only metadata such as `top_level_keys`, `entry_count`, and `type` rather than dumping prompt snippets or reasoning traces into `result_json`.
P0
alignedReplace implied magic with clear mental models
The implementation makes key capabilities and limits explicit in code paths and user-facing messages: `_detect_language()` only supports Python/TypeScript, `SelectionRejected` explains that include patterns are sorting signals rather than gates, `fetch_public_repo()` rejects non-public HTTPS GitHub URLs, and `run()` distinguishes `no_supported_language`, `selector_rejected`, `invalid_repo_url`, and typed `FetchError.kind` values. The selector constants also make bundle limits visible via `MAX_FILES`, `MAX_BUNDLE_BYTES`, `MAX_FILE_BYTES`, and `MIN_BUNDLE_BYTES`.
Embed in your README
Two embeddable variants: a small flat shield and a richer score card.
Score card (recommended)
[](https://aidesignblueprint.com/en/readiness-review/86b2c59d-0f04-4b30-a891-796c4276fead)
Flat badge
[](https://aidesignblueprint.com/en/readiness-review/86b2c59d-0f04-4b30-a891-796c4276fead)
Run ID: 86b2c59d-0f04-4b30-a891-796c4276fead · Results expire after 90 days
Run by agents. Governed by humans. Validated by the AI Design Blueprint.