Assessment complete; awaiting evidence revision.
Evaluated 12 May 2026 against the AI Design Blueprint doctrine
Emerging
Status: High Risk
67/100
Grade C
The submission materially improves the cohort bridge with a durable `CohortValidationJob` state machine, committed step transitions, typed failure fields, and an audit envelope. However, production trust boundaries still fail in two places: arbitrary repo content is still passed to the validator as a raw same-channel text bundle with only a natural-language header, and several blocker/error paths can be misclassified or lose their durable failure transition.
Iteration history
1 prior run on this artifact. Each run_id opens its own readiness review.
| When | Score | Status | Run ID |
|---|---|---|---|
| 12 May 2026 (this run) | 67 / C | High Risk | e476247c… |
| 12 May 2026 | 30 / F | High Risk | 86b2c59d… |
Per-principle findings
10 principles evaluated. Verdict, severity, evidence and recommendation for each.
P0
needs changesproduction blocker75/100Make hand-offs, approvals, and blockers explicit
The workflow now has explicit handoff helpers (`mark_blocked`, `mark_failed`, `mark_completed`, `mark_aborted`), but several blockers can still be mishandled. The `invalid_url`/`invalid_repo_url` mismatch can turn an applicant-fixable URL issue into `unexpected_error`. In the persistence block, `except Exception` calls `mark_failed()` on the same SQLAlchemy session after a possible `db.flush()` or `db.commit()` failure without first rolling back, so the failure transition itself can fail and leave the job stuck. Similarly, if `mark_completed()` mutates `job.status='completed'` and its commit fails, the outer handler may see the in-memory terminal status and skip `mark_failed()`. Read failure…
Recommendation
Separate result persistence from failure-state persistence with rollback-safe transaction boundaries: call `db.rollback()` before marking `persist_error`, make terminal transition commits atomic and verifiable, normalize all external blocker kinds before `mark_blocked()`, and convert unrecoverable read/bundle failures into `blocked` states before validator invocation.
P0
needs changesproduction blocker65/100Establish trust through inspectability
The audit chain is strong: `commit_sha`, per-file `content_sha256`, `bundle_sha256`, `schema_version='bridge.audit.v1.1'`, and `boundary_contract_version` make the result traceable. The remaining trust-boundary issue is that `wrap_bundle_with_boundary(raw_bundle)` prepends `BOUNDARY_HEADER` into the same `code` string that contains arbitrary repo content, and `_build_implementation_context()` concatenates raw `# === FILE: {path} ===` headers plus unescaped file contents. A natural-language header inside the payload is helpful, but it is not the same as a trusted validation-service envelope; malicious file paths or contents still occupy the same instruction channel. This partially addresses t…
Recommendation
Move the inertness contract to the validation service’s trusted prompt/schema boundary: pass files as a typed JSON/file array with encoded paths and content, persist the envelope version/hash, and treat the rendered bundle only as an inspectable artifact rather than the authority boundary.
P0
needs changesproduction blocker60/100Expose meaningful operational state, not internal complexity
The structured status model is a major improvement, but some current paths can still expose misleading operational state. `fetch_public_repo()` raises `FetchError(..., kind='invalid_url')`, while `FAILURE_KINDS` contains `invalid_repo_url` rather than `invalid_url`; `_execute_with_job()` may pass `invalid_url` into `mark_blocked()`, causing a `ValueError` and eventual `unexpected_error` instead of a user-actionable URL blocker. `_build_implementation_context()` also records `skipped_during_read` but the workflow can still proceed to validation and `mark_completed()` even if selected files failed to read, so the job may say completed when the validation input was materially incomplete. This i…
Recommendation
Make failure taxonomy a single shared typed primitive across fetcher and job state, map `invalid_url` to `invalid_repo_url`, and block the job when the post-read bundle has no successfully read source files or falls below the minimum useful source threshold.
P0
needs changeshardening recommended45/100Optimise for steering, not only initiating
`abort_requested` and `mark_aborted()` are the right primitive direction, and `mark_step_started()` checks for aborts between steps. The submitted code does not show an operator-safe mutation surface such as `request_abort(job_id)` or `retry_job(job_id)`, `retry_count` is defined but never incremented, and there is no row refresh/lock before checking `job.abort_requested`, so an external abort may depend on SQLAlchemy session-expiration behavior. The current implementation improves the prior lack of steering, but it is not yet a complete steering surface.
Recommendation
Add a small service-owned steering API/CLI outside the execution loop for `request_abort`, `retry_failed_job`, and `mark_interrupted`; refresh or lock the job row before each irreversible/external step, and increment/link retries rather than relying on manual reruns.
P0
alignedDesign for delegation rather than direct manipulation
`approve()` lets the founder approve an application once, then delegates the repo scan to `run_cohort_validate(app.id)`. `_execute_with_job` owns the operational sequence — parse repo, clone, select, bundle, validate, persist — and `ValidationRequest` carries the task, repository namespace, and selected files rather than requiring the operator to perform each step manually. This maintains the prior aligned result.
P0
alignedEnsure that background work remains perceptible
`create_job()` inserts and commits a `CohortValidationJob` at `status='queued'` before clone or validation begins. `mark_step_started()` records `<step>_started_at`, updates `status`, and commits for `cloning`, `selecting`, `bundling`, and `validating`; terminal helpers persist `completed`, `blocked`, `failed`, or `aborted`. The audit object also records `job_id` and `_collect_job_transitions(job)`. This addresses the prior blocker around no durable run row before clone/validation.
P0
alignedAlign feedback with the user’s level of attention
Foreground output is concise (`Created CohortValidationJob#...`, terminal status, failure kind), while durable detail lives in `safe_display_message`, `failure_kind`, timestamps, and the audit object. `_summarize_validate_log()` deliberately stores structural log shape rather than dumping full content, which keeps routine operation quiet while preserving escalation detail. This improves the prior feedback/lifecycle gap.
P0
alignedApply progressive disclosure to system agency
The primary operational layer is the small job state surface (`status`, `failure_kind`, `safe_display_message`, `retry_eligible`), while deeper inspection is available through `result_json['audit']`, including source commit, selected file hashes, bundle hash, validator latency, log-signal summary, and job transitions. This preserves progressive disclosure rather than exposing raw logs as the default view.
P0
alignedReplace implied magic with clear mental models
The code gives operators a clear model of what the workflow does and cannot do: `cohort_validation_job.py` documents the state machine and the distinction between `blocked`, `failed`, and `aborted`; `agentic_surface_selector.py` documents that include patterns are sort preferences rather than gates; `FetchError.kind`, `FAILURE_KINDS`, and `retry_eligible` make dependencies and recovery expectations explicit.
P0
alignedRepresent delegated work as a system, not merely as a conversation
`CohortValidationJob` represents the delegated work as a system with durable state, timestamps, terminal statuses, retry metadata, abort metadata, linkage to `UserValidationRun`, and audit quick-access fields. `_collect_job_transitions()` mirrors the lifecycle into the persisted audit object, separating execution state from the founder CLI’s conversational/console output. This addresses the prior recommendation to represent the bridge as a service-owned validation job.
Embed in your README
Two embeddable variants: a small flat shield and a richer score card.
Score card (recommended)
[](https://aidesignblueprint.com/en/readiness-review/e476247c-d9c0-44f4-a25f-dbbdb7eb7b15)
Flat badge
[](https://aidesignblueprint.com/en/readiness-review/e476247c-d9c0-44f4-a25f-dbbdb7eb7b15)
Iteration delta
Improvements (7)
Run ID: e476247c-d9c0-44f4-a25f-dbbdb7eb7b15 · Results expire after 90 days
Run by agents. Governed by humans. Validated by the AI Design Blueprint.