Skip to main contentSkip to footer
Reviewed

Assessment complete; awaiting evidence revision.

Agent Architecture Review, Validation snapshot

Evaluated 12 May 2026 against the AI Design Blueprint doctrine

Emerging

Status: High Risk

74/100

Grade C

9 aligned1 production blockers
Per-principle verdicts

This is an autonomous cohort-validation workflow: it persists jobs, fetches repositories, selects code, calls an external validator/LLM service, sends external onboarding emails, and now exposes operator steering commands. Most delegation, perceptibility, audit, and blocker primitives are structurally sound. Iter12 materially improves steering, but P10 still needs changes because onboarding retry is not truly step-aware/idempotent and validation retry can reset state while a job may still be active, creating duplicate-run/race risk.

Iteration history

5 prior runs on this artifact. Each run_id opens its own readiness review.

WhenScoreStatusRun ID
12 May 2026 (this run)74 / CHigh Risk4760459e
12 May 202674 / CHigh Risk9caf9385
12 May 202674 / CAligned093809b5
12 May 202698 / AAligned4128f700
12 May 202698 / AAligned270e7ca6
12 May 202674 / CHigh Risk8364019d

Per-principle findings

10 principles evaluated. Verdict, severity, evidence and recommendation for each.

P0

needs changesproduction blocker60/100

Optimise for steering, not only initiating

Iter12 adds a real durable steering primitive in `steer_cohort_validation.py`: `request_abort()` persists `abort_requested=True`, `retry_failed_validation_job()` resets application validation state, and `retry_onboarding_handoff()` clears typed onboarding failures. However, onboarding recovery is not actually step-aware/idempotent: `retry_onboarding_handoff()` sets `onboarding_state='pending'`, and `approve()` then unconditionally calls `create_firebase_user_for_email(app.email, ...)` before checking `existing_user` or branching on the prior failed step. A retry from `sign_in_link_failed` or `approval_email_failed` can therefore re-run Firebase user creation after the user was already create…

Recommendation

Move retry authority into a small transactional state-transition boundary that dispatches only the next missing side effect. For onboarding, persist or derive the last successful external step and use an idempotent get-or-create Firebase/user operation before sign-in-link/email retries; do not reset every recoverable failure back to `pending`. For validation retry, require the latest job to be terminal and `retry_eligible` before clearing `validation_run_id`, and record the old/new job relationship atomically.

P0

aligned

Design for delegation rather than direct manipulation

Delegated work is represented as an assigned cohort-validation task rather than manual step execution: `approve()` transitions the application into `validation_queued`, `run_cohort_validate.run()` creates a persistent `CohortValidationJob`, and `_execute_with_job()` carries out clone/select/bundle/validate steps from `repo_url`. The scope is explicit through `ValidationContext(repository=namespace, files=list(selection.selected_paths))`, and operator controls exist via `request_abort()`, `retry_failed_validation_job()`, and `retry_onboarding_handoff()`.

P0

aligned

Ensure that background work remains perceptible

Background work remains perceptible through persisted state on `CohortValidationJob`: `status`, `queued_at`, per-step timestamps such as `cloning_started_at` and `validating_started_at`, `terminal_at`, `failure_kind`, `safe_display_message`, `retry_eligible`, and `abort_requested`. `_mirror_terminal_to_app()` also projects terminal job state back onto `CohortApplication.onboarding_state`, allowing the application record to preserve continuity after the runner exits.

P0

aligned

Align feedback with the user’s level of attention

The workflow separates routine status from higher-attention failure detail. Normal progress is encoded in concise states such as `queued`, `cloning`, `selecting`, `bundling`, and `validating`; failures carry `failure_kind`, `safe_display_message`, and `retry_eligible`; and `_summarize_validate_log()` deliberately reduces validator internals to a summary of presence/type/key counts instead of flooding the operator with raw logs.

P0

aligned

Apply progressive disclosure to system agency

The code uses progressive disclosure between primary operational state and deeper audit detail. The primary records expose compact states on `CohortApplication.onboarding_state` and `CohortValidationJob.status`, while `_execute_with_job()` builds a detailed `audit_object` containing source commit, selected file hashes, skipped files, bundle hash, validator latency, usage presence, envelope schema, and envelope hash for deeper inspection when needed.

P0

aligned

Replace implied magic with clear mental models

The workflow gives operators a concrete mental model through typed lifecycle states and explicit constraints: `STEPS`, `TERMINAL_STATUSES`, and `FAILURE_KINDS` define the runner’s state space; `BOUNDARY_HEADER` and `ENVELOPE_ADVISORY` state that repository code is untrusted inert input; and the steering CLI documents the known limitation that an in-flight `validate_code_against_principles()` call is not cancellable until it returns.

P0

aligned

Expose meaningful operational state, not internal complexity

Meaningful operational states are exposed without requiring users to interpret raw stack traces. `mark_blocked()`, `mark_failed()`, and `mark_aborted()` store user-relevant terminal statuses plus bounded `safe_display_message` values, while `_mirror_terminal_to_app()` converts job outcomes into application-level states such as `validation_complete` and `validation_failed`. Deeper mechanics remain available in audit JSON rather than being the primary state model.

P0

aligned

Establish trust through inspectability

The implementation has strong inspectability primitives: `build_file_envelope()` creates a typed `bridge.files.v2` envelope with an `envelope_hash`; `_build_implementation_context()` records per-file SHA-256 hashes; `fetch_public_repo()` captures `commit_sha`; and `_execute_with_job()` persists an `audit_object` with source, selection, bundle hash, validator log summary, and boundary contract metadata inside `UserValidationRun.result_json`. This makes the validator result traceable to a specific repository snapshot and selected file set.

P0

aligned

Make hand-offs, approvals, and blockers explicit

Approvals, handoffs, and blockers are explicit. `approve()` requires an interactive confirmation unless `--yes` is supplied, external onboarding steps have typed failure states (`firebase_user_failed`, `sign_in_link_failed`, `approval_email_failed`), repository/selector failures are marked through `mark_blocked()` with concrete `failure_kind` values, and validation/tool failures are marked through `mark_failed()` with retry eligibility. The code distinguishes blocked, failed, aborted, and completed terminal states instead of stalling silently.

P0

aligned

Represent delegated work as a system, not merely as a conversation

Delegated work is represented as a structured system rather than a message stream. `CohortValidationJob` is the durable task object, `CohortApplication` carries onboarding state, `UserValidationRun` stores the validator result, and `_execute_with_job()` advances through explicit clone/select/bundle/validate phases. Dependencies and artifacts are captured as structured fields such as `commit_sha`, `bundle_sha256`, `selected_count`, `skipped_count`, and `validation_run_local_id`.

Embed in your README

Two embeddable variants: a small flat shield and a richer score card.

Score card (recommended)

Blueprint Readiness Score card
[![Blueprint Readiness Score card](https://aidesignblueprint.com/api/badge/run/4760459e-5af0-4927-834c-9fac4c5c3bd2/card.svg)](https://aidesignblueprint.com/en/readiness-review/4760459e-5af0-4927-834c-9fac4c5c3bd2)

Flat badge

Blueprint Readiness Score badge
[![Blueprint Readiness Score](https://aidesignblueprint.com/api/badge/run/4760459e-5af0-4927-834c-9fac4c5c3bd2.svg)](https://aidesignblueprint.com/en/readiness-review/4760459e-5af0-4927-834c-9fac4c5c3bd2)
Baseline and iteration details
Baseline: usedDoctrine: same doctrineRace: checked clear
Rubric: 2026-05-04Grade limited by 0 high-risk findings

Run ID: 4760459e-5af0-4927-834c-9fac4c5c3bd2 · Results expire after 90 days

Run by agents. Governed by humans. Validated by the AI Design Blueprint.