Reviewed

Assessment complete; awaiting evidence revision.

Agent Architecture Review, Validation snapshot

Evaluated 12 May 2026 against the AI Design Blueprint doctrine

Emerging

Status: High Risk

67/100

Grade C

6 aligned3 production blockers1 hardening

Per-principle verdicts

The submission materially improves the cohort bridge with a durable `CohortValidationJob` state machine, committed step transitions, typed failure fields, and an audit envelope. However, production trust boundaries still fail in two places: arbitrary repo content is still passed to the validator as a raw same-channel text bundle with only a natural-language header, and several blocker/error paths can be misclassified or lose their durable failure transition.

Iteration history

1 prior run on this artifact. Each run_id opens its own readiness review.

When	Score	Status	Run ID
12 May 2026 (this run)	67 / C	High Risk	e476247c…
12 May 2026	30 / F	High Risk	86b2c59d…

Per-principle findings

10 principles evaluated. Verdict, severity, evidence and recommendation for each.

needs changesproduction blocker75/100

Make hand-offs, approvals, and blockers explicit

The workflow now has explicit handoff helpers (`mark_blocked`, `mark_failed`, `mark_completed`, `mark_aborted`), but several blockers can still be mishandled. The `invalid_url`/`invalid_repo_url` mismatch can turn an applicant-fixable URL issue into `unexpected_error`. In the persistence block, `except Exception` calls `mark_failed()` on the same SQLAlchemy session after a possible `db.flush()` or `db.commit()` failure without first rolling back, so the failure transition itself can fail and leave the job stuck. Similarly, if `mark_completed()` mutates `job.status='completed'` and its commit fails, the outer handler may see the in-memory terminal status and skip `mark_failed()`. Read failure…

Recommendation

Separate result persistence from failure-state persistence with rollback-safe transaction boundaries: call `db.rollback()` before marking `persist_error`, make terminal transition commits atomic and verifiable, normalize all external blocker kinds before `mark_blocked()`, and convert unrecoverable read/bundle failures into `blocked` states before validator invocation.

needs changesproduction blocker65/100

Establish trust through inspectability

The audit chain is strong: `commit_sha`, per-file `content_sha256`, `bundle_sha256`, `schema_version='bridge.audit.v1.1'`, and `boundary_contract_version` make the result traceable. The remaining trust-boundary issue is that `wrap_bundle_with_boundary(raw_bundle)` prepends `BOUNDARY_HEADER` into the same `code` string that contains arbitrary repo content, and `_build_implementation_context()` concatenates raw `# === FILE: {path} ===` headers plus unescaped file contents. A natural-language header inside the payload is helpful, but it is not the same as a trusted validation-service envelope; malicious file paths or contents still occupy the same instruction channel. This partially addresses t…

Recommendation

Move the inertness contract to the validation service’s trusted prompt/schema boundary: pass files as a typed JSON/file array with encoded paths and content, persist the envelope version/hash, and treat the rendered bundle only as an inspectable artifact rather than the authority boundary.

needs changesproduction blocker60/100

Expose meaningful operational state, not internal complexity

The structured status model is a major improvement, but some current paths can still expose misleading operational state. `fetch_public_repo()` raises `FetchError(..., kind='invalid_url')`, while `FAILURE_KINDS` contains `invalid_repo_url` rather than `invalid_url`; `_execute_with_job()` may pass `invalid_url` into `mark_blocked()`, causing a `ValueError` and eventual `unexpected_error` instead of a user-actionable URL blocker. `_build_implementation_context()` also records `skipped_during_read` but the workflow can still proceed to validation and `mark_completed()` even if selected files failed to read, so the job may say completed when the validation input was materially incomplete. This i…

Recommendation

Make failure taxonomy a single shared typed primitive across fetcher and job state, map `invalid_url` to `invalid_repo_url`, and block the job when the post-read bundle has no successfully read source files or falls below the minimum useful source threshold.

needs changeshardening recommended45/100

Optimise for steering, not only initiating

`abort_requested` and `mark_aborted()` are the right primitive direction, and `mark_step_started()` checks for aborts between steps. The submitted code does not show an operator-safe mutation surface such as `request_abort(job_id)` or `retry_job(job_id)`, `retry_count` is defined but never incremented, and there is no row refresh/lock before checking `job.abort_requested`, so an external abort may depend on SQLAlchemy session-expiration behavior. The current implementation improves the prior lack of steering, but it is not yet a complete steering surface.

Recommendation

Add a small service-owned steering API/CLI outside the execution loop for `request_abort`, `retry_failed_job`, and `mark_interrupted`; refresh or lock the job row before each irreversible/external step, and increment/link retries rather than relying on manual reruns.

aligned

Design for delegation rather than direct manipulation

`approve()` lets the founder approve an application once, then delegates the repo scan to `run_cohort_validate(app.id)`. `_execute_with_job` owns the operational sequence — parse repo, clone, select, bundle, validate, persist — and `ValidationRequest` carries the task, repository namespace, and selected files rather than requiring the operator to perform each step manually. This maintains the prior aligned result.

aligned

Ensure that background work remains perceptible

`create_job()` inserts and commits a `CohortValidationJob` at `status='queued'` before clone or validation begins. `mark_step_started()` records `<step>_started_at`, updates `status`, and commits for `cloning`, `selecting`, `bundling`, and `validating`; terminal helpers persist `completed`, `blocked`, `failed`, or `aborted`. The audit object also records `job_id` and `_collect_job_transitions(job)`. This addresses the prior blocker around no durable run row before clone/validation.

aligned

Align feedback with the user’s level of attention

Foreground output is concise (`Created CohortValidationJob#...`, terminal status, failure kind), while durable detail lives in `safe_display_message`, `failure_kind`, timestamps, and the audit object. `_summarize_validate_log()` deliberately stores structural log shape rather than dumping full content, which keeps routine operation quiet while preserving escalation detail. This improves the prior feedback/lifecycle gap.

aligned

Apply progressive disclosure to system agency

The primary operational layer is the small job state surface (`status`, `failure_kind`, `safe_display_message`, `retry_eligible`), while deeper inspection is available through `result_json['audit']`, including source commit, selected file hashes, bundle hash, validator latency, log-signal summary, and job transitions. This preserves progressive disclosure rather than exposing raw logs as the default view.

aligned

Replace implied magic with clear mental models

The code gives operators a clear model of what the workflow does and cannot do: `cohort_validation_job.py` documents the state machine and the distinction between `blocked`, `failed`, and `aborted`; `agentic_surface_selector.py` documents that include patterns are sort preferences rather than gates; `FetchError.kind`, `FAILURE_KINDS`, and `retry_eligible` make dependencies and recovery expectations explicit.

aligned

Represent delegated work as a system, not merely as a conversation

`CohortValidationJob` represents the delegated work as a system with durable state, timestamps, terminal statuses, retry metadata, abort metadata, linkage to `UserValidationRun`, and audit quick-access fields. `_collect_job_transitions()` mirrors the lifecycle into the persisted audit object, separating execution state from the founder CLI’s conversational/console output. This addresses the prior recommendation to represent the bridge as a service-owned validation job.

Embed in your README

Two embeddable variants: a small flat shield and a richer score card.

Score card (recommended)

[![Blueprint Readiness Score card](https://aidesignblueprint.com/api/badge/run/e476247c-d9c0-44f4-a25f-dbbdb7eb7b15/card.svg)](https://aidesignblueprint.com/en/readiness-review/e476247c-d9c0-44f4-a25f-dbbdb7eb7b15)

Flat badge

[![Blueprint Readiness Score](https://aidesignblueprint.com/api/badge/run/e476247c-d9c0-44f4-a25f-dbbdb7eb7b15.svg)](https://aidesignblueprint.com/en/readiness-review/e476247c-d9c0-44f4-a25f-dbbdb7eb7b15)

Baseline and iteration details

Baseline: usedDoctrine: same doctrineRace: checked clear

Iteration delta

Improvements (7)

P2Ensure that background work remains perceptibleneeds_changesaligned

P3Align feedback with the user’s level of attentionneeds_changesaligned

P6Expose meaningful operational state, not internal complexityneeds_changesneeds_changes

P7Establish trust through inspectabilityneeds_changesneeds_changes

P8Make hand-offs, approvals, and blockers explicitneeds_changesneeds_changes

P9Represent delegated work as a system, not merely as a conversationneeds_changesaligned

P10Optimise for steering, not only initiatingneeds_changesneeds_changes

Rubric: 2026-05-04

Run your own validation AI Design Blueprint

Run ID: e476247c-d9c0-44f4-a25f-dbbdb7eb7b15 · Results expire after 90 days

Run by agents. Governed by humans. Validated by the AI Design Blueprint.