Assessment complete; awaiting evidence revision.
Evaluated 12 May 2026 against the AI Design Blueprint doctrine
Emerging
Status: High Risk
74/100
Grade C
The submission is clearly an autonomous cohort-validation workflow and has strong primitives: a durable CohortValidationJob lifecycle, explicit terminal states, typed failure kinds, source/audit metadata, and a JSON file envelope with hashes. It materially improves the prior P7/P8 blockers, but it is not fully aligned yet: the selector can still omit high-signal agentic files because of a priority inversion, the approval/onboarding state machine does not durably capture Firebase-user-creation failure, and steering remains limited to partial between-step abort checks.
Iteration history
3 prior runs on this artifact. Each run_id opens its own readiness review.
Per-principle findings
10 principles evaluated. Verdict, severity, evidence and recommendation for each.
P0
needs changesproduction blocker50/100Make hand-offs, approvals, and blockers explicit
The validation job handoffs are explicit through mark_blocked(), mark_failed(), mark_aborted(), and typed failure_kind values, and approve_cohort_application.approve() now records sign_in_link_failed, approval_email_failed, validation_queued, validation_complete, and validation_failed. But the first external approval handoff is still not durably protected: after committing onboarding_state='pending', approve() calls create_firebase_user_for_email(app.email, ...) without a try/except or typed failure transition. If Firebase creation fails or times out, the application can remain in pending with no onboarding_failure_reason or firebase_user_failed state. Delta vs prior baseline: the sign-in-li…
Recommendation
Move approval side effects behind a durable onboarding transition service/job, or at minimum wrap create_firebase_user_for_email() with a committed firebase_user_create_started/firebase_user_failed transition and reconciliation for the case where Firebase succeeds but the following database commit fails.
P0
needs changesproduction blocker45/100Establish trust through inspectability
The prior raw-concatenation blocker is materially improved by build_file_envelope(), which emits envelope_schema, boundary_contract, file path/byte_size/sha256/content records, and envelope_hash, and the audit stores commit_sha, bundle_sha256, envelope_schema, and envelope_hash. However, inspectability is still undermined by a concrete selector priority inversion: select_agentic_surface() sets priority = 50 for files without positive signal but priority = 99 for positive-signal files that do not match SORT_PRIORITY. INCLUDED_NAME_PATTERNS contains coordinator, planner, router, and evaluator, but SORT_PRIORITY omits them, so high-signal files with those names can sort behind generic fallback…
Recommendation
Simplify the selector boundary so every positive-signal file sorts before fallback files: either add coordinator/planner/router/evaluator to SORT_PRIORITY or make the unmatched-positive default priority lower than the fallback priority. Persist enough inventory or omitted-positive metadata in the audit so a reviewer can verify that the validation saw the relevant agentic surface.
P0
needs changeshardening recommended35/100Optimise for steering, not only initiating
The code has partial steering primitives: CohortValidationJob.abort_requested exists, mark_step_started() checks it, _execute_with_job() refreshes the job before validation, and mark_aborted() records an aborted terminal state. But the submitted code does not expose a service-owned request_abort(job_id), retry_failed_job(job_id), or resume/interrupt command; retry_count is incremented only on a manual CLI re-run, and long operations such as fetch_public_repo() and asyncio.run(validate_code_against_principles()) cannot be interrupted once started. Delta vs prior baseline: this maintains the prior partial between-step abort support but does not address the prior recommendation for external ste…
Recommendation
Move steering authority into a small service/API outside the execution loop: provide request_abort, retry_failed_job, and resume/interruption commands, and make long-running clone/validate boundaries poll or honour cancellation checkpoints instead of requiring direct DB edits or full restarts.
P0
alignedDesign for delegation rather than direct manipulation
The workflow is designed around delegated validation rather than manual step execution: approve_cohort_application.approve() gates approval, then run_cohort_validate.run() creates a CohortValidationJob and _execute_with_job() performs clone, language detection, file selection, bundling, LLM validation, and persistence. ValidationRequest.context records the delegated task, repository namespace, and selected files.
P0
alignedEnsure that background work remains perceptible
Background work is made durable and perceptible through CohortValidationJob.status, queued_at, cloning_started_at, selecting_started_at, bundling_started_at, validating_started_at, terminal_at, failure_kind, safe_display_message, retry_eligible, and abort_requested. create_job() commits a queued row before bridge work starts, and mark_step_started(), mark_completed(), mark_blocked(), mark_failed(), and mark_aborted() persist state transitions.
P0
alignedAlign feedback with the user’s level of attention
Feedback is calibrated by separating user/input blockers from internal/transient failures: mark_blocked() sets status='blocked' and retry_eligible=False for invalid_repo_url, no_supported_language, selector_rejected, and read_failed, while mark_failed() marks retryable validate_error/unexpected_error paths. _summarize_validate_log() captures only log shape rather than leaking noisy content, and onboarding_failure_reason is set only on approval-side failures requiring attention.
P0
alignedApply progressive disclosure to system agency
The primary operational surface is concise status/failure state, while deeper inspection is available in the persisted audit object. audit_object separates source, selection, validate, and job sections; _collect_job_transitions() provides timeline detail; _summarize_validate_log() exposes top-level log shape without dumping raw validator internals.
P0
alignedReplace implied magic with clear mental models
The code exposes a clear mental model through explicit constants and state vocabularies: STEPS=('cloning','selecting','bundling','validating'), TERMINAL_STATUSES, FAILURE_KINDS, and onboarding_state values such as pending, approved, validation_queued, validation_complete, and validation_failed. The file boundary is also explicit via envelope_schema='bridge.files.v2', boundary_contract='untrusted_user_code', and BOUNDARY_CONTRACT_VERSION.
P0
alignedExpose meaningful operational state, not internal complexity
The workflow exposes meaningful operational states instead of raw internals: job.status uses queued/cloning/selecting/bundling/validating/completed/blocked/failed/aborted, while safe_display_message gives user-relevant explanations such as invalid repo URL, no supported language, clone failure, or validation error. Technical details like selected file hashes, latency_ms, and log_signals are reserved for the audit payload.
P0
alignedRepresent delegated work as a system, not merely as a conversation
Delegated work is represented as a structured system rather than a conversation: CohortValidationJob stores lifecycle state and timestamps, _execute_with_job() advances explicit steps, and the persisted audit object separates source, selection, validate, and job transition data. The validator result is stored in UserValidationRun with repository, latency, score/grade/tier, and audit metadata rather than only a free-form message stream.
Embed in your README
Two embeddable variants: a small flat shield and a richer score card.
Score card (recommended)
[](https://aidesignblueprint.com/en/readiness-review/3f3bb587-5ad1-4417-b624-23441b39831e)
Flat badge
[](https://aidesignblueprint.com/en/readiness-review/3f3bb587-5ad1-4417-b624-23441b39831e)
Iteration delta
Improvements (2)
Run ID: 3f3bb587-5ad1-4417-b624-23441b39831e · Results expire after 90 days
Run by agents. Governed by humans. Validated by the AI Design Blueprint.