Skip to main contentSkip to footer
Reviewed

Assessment complete; awaiting evidence revision.

Agent Architecture Review, Validation snapshot

Evaluated 12 May 2026 against the AI Design Blueprint doctrine

Emerging

Status: High Risk

74/100

Grade C

7 aligned2 production blockers1 hardening
Per-principle verdicts

The submission is clearly an autonomous cohort-validation workflow and has strong primitives: a durable CohortValidationJob lifecycle, explicit terminal states, typed failure kinds, source/audit metadata, and a JSON file envelope with hashes. It materially improves the prior P7/P8 blockers, but it is not fully aligned yet: the selector can still omit high-signal agentic files because of a priority inversion, the approval/onboarding state machine does not durably capture Firebase-user-creation failure, and steering remains limited to partial between-step abort checks.

Iteration history

3 prior runs on this artifact. Each run_id opens its own readiness review.

WhenScoreStatusRun ID
12 May 2026 (this run)74 / CHigh Risk3f3bb587
12 May 202674 / CHigh Risk659a695a
12 May 202667 / CHigh Riske476247c
12 May 202630 / FHigh Risk86b2c59d

Per-principle findings

10 principles evaluated. Verdict, severity, evidence and recommendation for each.

P0

needs changesproduction blocker50/100

Make hand-offs, approvals, and blockers explicit

The validation job handoffs are explicit through mark_blocked(), mark_failed(), mark_aborted(), and typed failure_kind values, and approve_cohort_application.approve() now records sign_in_link_failed, approval_email_failed, validation_queued, validation_complete, and validation_failed. But the first external approval handoff is still not durably protected: after committing onboarding_state='pending', approve() calls create_firebase_user_for_email(app.email, ...) without a try/except or typed failure transition. If Firebase creation fails or times out, the application can remain in pending with no onboarding_failure_reason or firebase_user_failed state. Delta vs prior baseline: the sign-in-li…

Recommendation

Move approval side effects behind a durable onboarding transition service/job, or at minimum wrap create_firebase_user_for_email() with a committed firebase_user_create_started/firebase_user_failed transition and reconciliation for the case where Firebase succeeds but the following database commit fails.

P0

needs changesproduction blocker45/100

Establish trust through inspectability

The prior raw-concatenation blocker is materially improved by build_file_envelope(), which emits envelope_schema, boundary_contract, file path/byte_size/sha256/content records, and envelope_hash, and the audit stores commit_sha, bundle_sha256, envelope_schema, and envelope_hash. However, inspectability is still undermined by a concrete selector priority inversion: select_agentic_surface() sets priority = 50 for files without positive signal but priority = 99 for positive-signal files that do not match SORT_PRIORITY. INCLUDED_NAME_PATTERNS contains coordinator, planner, router, and evaluator, but SORT_PRIORITY omits them, so high-signal files with those names can sort behind generic fallback…

Recommendation

Simplify the selector boundary so every positive-signal file sorts before fallback files: either add coordinator/planner/router/evaluator to SORT_PRIORITY or make the unmatched-positive default priority lower than the fallback priority. Persist enough inventory or omitted-positive metadata in the audit so a reviewer can verify that the validation saw the relevant agentic surface.

P0

needs changeshardening recommended35/100

Optimise for steering, not only initiating

The code has partial steering primitives: CohortValidationJob.abort_requested exists, mark_step_started() checks it, _execute_with_job() refreshes the job before validation, and mark_aborted() records an aborted terminal state. But the submitted code does not expose a service-owned request_abort(job_id), retry_failed_job(job_id), or resume/interrupt command; retry_count is incremented only on a manual CLI re-run, and long operations such as fetch_public_repo() and asyncio.run(validate_code_against_principles()) cannot be interrupted once started. Delta vs prior baseline: this maintains the prior partial between-step abort support but does not address the prior recommendation for external ste…

Recommendation

Move steering authority into a small service/API outside the execution loop: provide request_abort, retry_failed_job, and resume/interruption commands, and make long-running clone/validate boundaries poll or honour cancellation checkpoints instead of requiring direct DB edits or full restarts.

P0

aligned

Design for delegation rather than direct manipulation

The workflow is designed around delegated validation rather than manual step execution: approve_cohort_application.approve() gates approval, then run_cohort_validate.run() creates a CohortValidationJob and _execute_with_job() performs clone, language detection, file selection, bundling, LLM validation, and persistence. ValidationRequest.context records the delegated task, repository namespace, and selected files.

P0

aligned

Ensure that background work remains perceptible

Background work is made durable and perceptible through CohortValidationJob.status, queued_at, cloning_started_at, selecting_started_at, bundling_started_at, validating_started_at, terminal_at, failure_kind, safe_display_message, retry_eligible, and abort_requested. create_job() commits a queued row before bridge work starts, and mark_step_started(), mark_completed(), mark_blocked(), mark_failed(), and mark_aborted() persist state transitions.

P0

aligned

Align feedback with the user’s level of attention

Feedback is calibrated by separating user/input blockers from internal/transient failures: mark_blocked() sets status='blocked' and retry_eligible=False for invalid_repo_url, no_supported_language, selector_rejected, and read_failed, while mark_failed() marks retryable validate_error/unexpected_error paths. _summarize_validate_log() captures only log shape rather than leaking noisy content, and onboarding_failure_reason is set only on approval-side failures requiring attention.

P0

aligned

Apply progressive disclosure to system agency

The primary operational surface is concise status/failure state, while deeper inspection is available in the persisted audit object. audit_object separates source, selection, validate, and job sections; _collect_job_transitions() provides timeline detail; _summarize_validate_log() exposes top-level log shape without dumping raw validator internals.

P0

aligned

Replace implied magic with clear mental models

The code exposes a clear mental model through explicit constants and state vocabularies: STEPS=('cloning','selecting','bundling','validating'), TERMINAL_STATUSES, FAILURE_KINDS, and onboarding_state values such as pending, approved, validation_queued, validation_complete, and validation_failed. The file boundary is also explicit via envelope_schema='bridge.files.v2', boundary_contract='untrusted_user_code', and BOUNDARY_CONTRACT_VERSION.

P0

aligned

Expose meaningful operational state, not internal complexity

The workflow exposes meaningful operational states instead of raw internals: job.status uses queued/cloning/selecting/bundling/validating/completed/blocked/failed/aborted, while safe_display_message gives user-relevant explanations such as invalid repo URL, no supported language, clone failure, or validation error. Technical details like selected file hashes, latency_ms, and log_signals are reserved for the audit payload.

P0

aligned

Represent delegated work as a system, not merely as a conversation

Delegated work is represented as a structured system rather than a conversation: CohortValidationJob stores lifecycle state and timestamps, _execute_with_job() advances explicit steps, and the persisted audit object separates source, selection, validate, and job transition data. The validator result is stored in UserValidationRun with repository, latency, score/grade/tier, and audit metadata rather than only a free-form message stream.

Embed in your README

Two embeddable variants: a small flat shield and a richer score card.

Score card (recommended)

Blueprint Readiness Score card
[![Blueprint Readiness Score card](https://aidesignblueprint.com/api/badge/run/3f3bb587-5ad1-4417-b624-23441b39831e/card.svg)](https://aidesignblueprint.com/en/readiness-review/3f3bb587-5ad1-4417-b624-23441b39831e)

Flat badge

Blueprint Readiness Score badge
[![Blueprint Readiness Score](https://aidesignblueprint.com/api/badge/run/3f3bb587-5ad1-4417-b624-23441b39831e.svg)](https://aidesignblueprint.com/en/readiness-review/3f3bb587-5ad1-4417-b624-23441b39831e)
Baseline and iteration details
Baseline: usedDoctrine: same doctrineRace: checked clear

Iteration delta

Improvements (2)

P7Establish trust through inspectabilityneeds_changesneeds_changes
P8Make hand-offs, approvals, and blockers explicitneeds_changesneeds_changes
Rubric: 2026-05-04Grade limited by 0 high-risk findings

Run ID: 3f3bb587-5ad1-4417-b624-23441b39831e · Results expire after 90 days

Run by agents. Governed by humans. Validated by the AI Design Blueprint.