Assessment complete; awaiting evidence revision.
Evaluated 12 May 2026 against the AI Design Blueprint doctrine
Emerging
Status: High Risk
74/100
Grade C
This is a real delegated validation workflow with durable job state, typed transitions, audit metadata, and improved rollback/failure handling. The remaining production blockers are at the trust-boundary edges: arbitrary repository text is still bundled into an instruction-bearing validation payload with only an in-band warning header, and the approval/email handoff can fail after approval is committed without a durable blocker state.
Iteration history
2 prior runs on this artifact. Each run_id opens its own readiness review.
Per-principle findings
10 principles evaluated. Verdict, severity, evidence and recommendation for each.
P0
needs changesproduction blocker65/100Establish trust through inspectability
The audit trail is strong in several places (`commit_sha`, per-file `content_sha256`, `bundle_sha256`, selected/skipped files, latency, job transitions), but the key trust boundary for arbitrary public repository content is still in-band. `_build_implementation_context()` concatenates raw file contents into strings like `# === FILE: {path} ===\n{content}`, `wrap_bundle_with_boundary(raw_bundle)` prepends a warning header, and `ValidationRequest(code=bundle, ...)` sends that monolithic text to the validator. A repository can still contain instruction-shaped text inside comments/docstrings/strings; the boundary is a prompt convention rather than a typed artifact boundary enforced by `validatio…
Recommendation
Move the inertness boundary into the validation service as a typed file-envelope primitive: pass files as structured data such as `{path, byte_size, sha256, content}` with an envelope version/hash, render source only in a non-instruction-bearing section controlled by the service, and persist that envelope hash alongside the result. Do not rely on an in-band `BOUNDARY_HEADER` inside the same raw prompt text as the untrusted code.
P0
needs changesproduction blocker55/100Make hand-offs, approvals, and blockers explicit
The validation job handoff is much better: validate and persistence exceptions now call `db.rollback()` before `mark_failed()`, the outer handler rolls back and refreshes `job`, and terminal blockers use typed `failure_kind` values. However, the approval/onboarding handoff can still strand a real applicant outside that durable job model: `approve()` commits `app.approved_at` and `app.user_id`, then `generate_sign_in_link_for_email()` catches `Exception` and simply `return 3` without persisting an application/job blocker; `send_cohort_approval_email()` is not wrapped in a durable failure transition either. Delta: the prior rollback subfinding is improved, but a production handoff gap remains…
Recommendation
Move approval-side external handoffs into a durable service-owned state machine, or extend `CohortApplication`/a dedicated onboarding job with typed states such as `approved`, `sign_in_link_failed`, `approval_email_failed`, and `validation_queued`. Before returning from sign-in-link or email failures, persist a safe blocker message and retry eligibility so the operator is not left with only an exit code and an already-approved applicant.
P0
needs changeshardening recommended35/100Optimise for steering, not only initiating
The code has a real steering primitive in `CohortValidationJob.abort_requested`, `mark_step_started()` checks it before advancing, the validate step now calls `db.refresh(job)` before checking abort, and `run()` calculates a durable `retry_count` from prior jobs. That improves the prior P10 refresh/retry-count gap. The submitted code still does not expose a service-owned steering surface that sets `abort_requested`, retries a failed job, or marks an interrupted job; comments say a separate surface flips the flag, but no `request_abort`/`retry_failed_job` API or CLI is present in the reviewed code. This is a remaining steering gap, though the underlying DB flag/check primitive means the core…
Recommendation
Add a small service-owned steering command/API outside the execution loop for `request_abort(job_id)`, `retry_failed_job(job_id)`, and `mark_interrupted(job_id)`, and have the orchestrator refresh before each external or costly step. Keep it as a thin state-transition primitive rather than adding a large orchestration wrapper.
P0
alignedDesign for delegation rather than direct manipulation
The user/operator delegates the scan at the application level: `approve()` records approval, then `run_cohort_validate(app.id)` creates a `CohortValidationJob`; `_execute_with_job()` owns clone, language detection, file selection, bundling, validation, persistence, and terminal state. Scope and constraints are represented through `repo_url`, `namespace = f"cohort/{application_id}/{owner}/{repo_name}"`, `ValidationContext(task=..., repository=..., files=...)`, and the selected file list rather than requiring the operator to execute each step manually.
P0
alignedEnsure that background work remains perceptible
Background work is made perceptible through a persistent `CohortValidationJob` row created before work begins in `create_job()`, with durable statuses and timestamps for `queued`, `cloning`, `selecting`, `bundling`, `validating`, and terminal states. `mark_step_started()`, `mark_completed()`, `mark_blocked()`, `mark_failed()`, and `mark_aborted()` commit state transitions, and the approval CLI prints the latest terminal job status/failure fields after validation returns.
P0
alignedAlign feedback with the user’s level of attention
The code separates low-noise routine progress from attention-demanding conditions: normal progress is persisted as status/timestamps, while blocked/failed states carry `failure_kind`, `safe_display_message`, and `retry_eligible`. `approve()` prints terminal status and failure details only after the run, while deeper validation telemetry is summarized via `_summarize_validate_log()` inside the audit object rather than streamed noisily to the operator.
P0
alignedApply progressive disclosure to system agency
Primary state is simple (`status`, `failure_kind`, `safe_display_message`, `retry_eligible`), while detailed inspection is available in the persisted `audit_object` containing `source`, `selection`, `validate`, and `job` sections. The code records file hashes, selected paths, skipped reads, latency, usage presence, and `_collect_job_transitions(job)` without forcing those internals into the main terminal message.
P0
alignedReplace implied magic with clear mental models
The workflow exposes a clear mental model through explicit state-machine comments in `cohort_validation_job.py` and `CohortValidationJob`, bounded repository support in `_looks_like_public_github_https()`, language detection in `_detect_language()`, selector rejection via `SelectionRejected`, and a visible untrusted-input boundary constant `BOUNDARY_HEADER`. Failure messages distinguish unsupported language, invalid repository URL, clone timeout/failure, repository size, read failure, validator failure, persistence failure, and abort.
P0
alignedExpose meaningful operational state, not internal complexity
Operational state is represented in user/action-relevant terms: `TERMINAL_STATUSES`, `FAILURE_KINDS`, `mark_blocked()`, `mark_failed()`, and `safe_display_message` avoid exposing raw stack traces as the main status surface. The current code also addresses the prior P6 gap: `FetchError.kind == "invalid_url"` is explicitly mapped to `"invalid_repo_url"` in `kind_map`, and an all-read-failed bundle now blocks with `failure_kind="read_failed"` instead of validating an empty/materially incomplete bundle.
P0
alignedRepresent delegated work as a system, not merely as a conversation
Delegated work is represented as a structured system rather than a conversation: `CohortValidationJob` stores lifecycle state, timestamps, abort/retry fields, terminal failure fields, validation-run linkage, commit/bundle hashes, and selected/skipped counts. `_execute_with_job()` follows explicit steps, while the persisted `audit_object` captures source, selection, validation, and job-transition substructures.
Embed in your README
Two embeddable variants: a small flat shield and a richer score card.
Score card (recommended)
[](https://aidesignblueprint.com/en/readiness-review/659a695a-2bf6-4b28-a52f-b6098c20ee0b)
Flat badge
[](https://aidesignblueprint.com/en/readiness-review/659a695a-2bf6-4b28-a52f-b6098c20ee0b)
Iteration delta
Improvements (3)
Run ID: 659a695a-2bf6-4b28-a52f-b6098c20ee0b · Results expire after 90 days
Run by agents. Governed by humans. Validated by the AI Design Blueprint.