Assessment complete; awaiting evidence revision.
Evaluated 12 May 2026 against the AI Design Blueprint doctrine
Emerging
Status: High Risk
74/100
Grade C
The submission implements a credible delegated validation workflow: persistent job records, explicit phase states, typed failure kinds, source/audit hashes, and an untrusted-code envelope give the background LLM validation flow inspectable boundaries. The remaining gap is steering/recovery: abort/retry controls are not exposed as durable command primitives, validation cannot be cancelled once the external call starts, and partial onboarding failures after external side effects can strand an approved applicant without an in-code retry path.
Iteration history
5 prior runs on this artifact. Each run_id opens its own readiness review.
Per-principle findings
10 principles evaluated. Verdict, severity, evidence and recommendation for each.
P0
needs changesproduction blocker60/100Optimise for steering, not only initiating
Steering is only partially implemented. `CohortValidationJob` has `abort_requested`, `retry_count`, and `retry_eligible`, and `mark_step_started()` checks aborts between phases, but there is no durable command boundary such as `request_abort(job_id)` or `retry_failed_job(job_id)` exposed outside the runner. Once `asyncio.run(validate_code_against_principles(request, api_key))` starts, the job has no visible timeout/cancel path in this code. The approval flow can also strand partial external side effects: after `approved_at` is set, failures in `generate_sign_in_link_for_email()` or `send_cohort_approval_email()` set failure states, but `approve()` cannot be rerun because it exits when `app.a…
Recommendation
Move steering and recovery into a small durable service boundary outside the execution loop. Provide command functions such as `request_abort(job_id)`, `retry_failed_validation_job(job_id)`, and `retry_onboarding_handoff(application_id)` that operate on persisted states, are safe to call after partial Firebase/email/validation failures, and are polled or enforced with bounded timeouts around external calls.
P0
alignedDesign for delegation rather than direct manipulation
The workflow is designed around delegation of a cohort validation task rather than manual file-by-file operation: `approve()` records approval intent, then `run_cohort_validate()` creates a `CohortValidationJob`, clones the applicant repo, selects an agentic surface via `select_agentic_surface()`, builds a `ValidationRequest`, and calls `validate_code_against_principles()`. Scope constraints are encoded through selector limits such as `MAX_FILES`, `MAX_BUNDLE_BYTES`, `MAX_FILE_BYTES`, language detection in `_detect_language()`, and repository namespace construction in `_execute_with_job()`.
P0
alignedEnsure that background work remains perceptible
Background work is made perceptible through durable job state in `CohortValidationJob`: `status`, phase timestamps such as `cloning_started_at`, `selecting_started_at`, `bundling_started_at`, `validating_started_at`, `terminal_at`, and typed terminal fields like `failure_kind`, `safe_display_message`, and `retry_eligible`. The runner updates these via `mark_step_started()`, `mark_completed()`, `mark_blocked()`, `mark_failed()`, and mirrors terminal state back to the application through `_mirror_terminal_to_app()`.
P0
alignedAlign feedback with the user’s level of attention
Feedback is tiered by attention level: the applicant/application surface receives coarse states such as `validation_queued`, `validation_complete`, and `validation_failed` through `onboarding_state`, while operators can inspect `failure_kind`, `safe_display_message`, `retry_eligible`, and the persisted audit object. Routine progress is represented by status transitions; higher-attention failures are escalated with typed messages such as `invalid_repo_url`, `selector_rejected`, `validate_error`, and `persist_error`.
P0
alignedApply progressive disclosure to system agency
The code supports progressive disclosure by separating simple operational state from detailed inspection data. Primary state is stored as concise values like `queued`, `cloning`, `validating`, `completed`, `blocked`, `failed`, and `aborted`; deeper evidence is stored in `audit_object`, including `source`, `selection`, `validate`, and `job` sections. `_summarize_validate_log()` deliberately reduces raw validator internals to safe metadata such as top-level keys and entry counts rather than dumping the full log by default.
P0
alignedReplace implied magic with clear mental models
The workflow makes its operating model explicit: `fetch_public_repo()` only accepts public HTTPS GitHub URLs via `_GITHUB_URL_RE`; unsupported repos become `invalid_repo_url`, `clone_failed`, `clone_timeout`, or `repo_too_large`; language support is limited by `_detect_language()` and `LANGUAGE_EXTENSIONS`; and the validation bundle is wrapped with `BOUNDARY_HEADER` plus `ENVELOPE_ADVISORY` stating that repo code is untrusted inert input. These are concrete capability and boundary signals rather than implied magic.
P0
alignedExpose meaningful operational state, not internal complexity
The state model exposes user/operator-meaningful states instead of raw implementation chatter. `CohortValidationJob.status` uses terms like `queued`, `cloning`, `selecting`, `bundling`, `validating`, `completed`, `blocked`, `failed`, and `aborted`; `_mirror_terminal_to_app()` maps those into application-level `validation_complete` or `validation_failed`; and user-safe explanations are kept in `safe_display_message` rather than exposing subprocess output or raw validator logs.
P0
alignedEstablish trust through inspectability
Inspectability is backed by concrete trace artifacts. `build_file_envelope()` creates a typed `bridge.files.v2` envelope with per-file content, paths, byte sizes, and SHA-256 hashes, then computes an `envelope_hash`; `_execute_with_job()` records `commit_sha`, selected file hashes, skipped reads, bundle size, `bundle_sha256`, `latency_ms`, validator log signals, `BOUNDARY_CONTRACT_VERSION`, `ENVELOPE_SCHEMA`, and `envelope_hash` in `audit_object`, which is merged into `UserValidationRun.result_json`.
P0
alignedMake hand-offs, approvals, and blockers explicit
Approvals, hand-offs, and blockers are explicit. `approve()` requires an operator confirmation unless `--yes` is supplied, then separately records Firebase creation, sign-in link generation, email delivery, and validation queueing states. Validation blockers are typed through `FAILURE_KINDS` and set with `mark_blocked()` or `mark_failed()`, with distinct outcomes for policy/user constraints such as `invalid_repo_url`, `no_supported_language`, and `selector_rejected` versus tool failures such as `clone_timeout`, `validate_error`, and `persist_error`.
P0
alignedRepresent delegated work as a system, not merely as a conversation
Delegated work is represented as a structured system rather than a conversation. The code has persistent system records (`CohortApplication`, `CohortValidationJob`, `UserValidationRun`), explicit phase transitions in `mark_step_started()`, terminal-state functions, retry metadata, abort metadata, timestamps for each stage, and a structured audit object. Execution state is separate from the LLM response and is persisted independently in the job table.
Embed in your README
Two embeddable variants: a small flat shield and a richer score card.
Score card (recommended)
[](https://aidesignblueprint.com/en/readiness-review/9caf9385-cd3b-4936-b455-87a916577e3a)
Flat badge
[](https://aidesignblueprint.com/en/readiness-review/9caf9385-cd3b-4936-b455-87a916577e3a)
Iteration delta
Regressions (1)
Run ID: 9caf9385-cd3b-4936-b455-87a916577e3a · Results expire after 90 days
Run by agents. Governed by humans. Validated by the AI Design Blueprint.