Skip to main contentSkip to footer
Reviewed

Assessment complete; awaiting evidence revision.

Agent Architecture Review, Validation snapshot

Evaluated 12 May 2026 against the AI Design Blueprint doctrine

Emerging

Status: High Risk

74/100

Grade C

8 aligned1 production blockers1 hardening
Per-principle verdicts

The submission is an autonomous multi-step validation workflow with strong primitives for durable job state, typed untrusted-code envelopes, file/content hashing, and auditability. Iter6 materially fixes the prior selector inspectability bug and the Firebase-create failure transition, but one production-relevant handoff still can stall silently: approval sets validation_queued before run_cohort_validate may return without creating a job when OPENAI_API_KEY is missing. Steering also remains a next-iteration hardening gap rather than a production blocker.

Iteration history

4 prior runs on this artifact. Each run_id opens its own readiness review.

WhenScoreStatusRun ID
12 May 2026 (this run)74 / CHigh Risk14a3456f
12 May 202674 / CHigh Risk3f3bb587
12 May 202674 / CHigh Risk659a695a
12 May 202667 / CHigh Riske476247c
12 May 202630 / FHigh Risk86b2c59d

Per-principle findings

10 principles evaluated. Verdict, severity, evidence and recommendation for each.

P0

needs changesproduction blocker45/100

Make hand-offs, approvals, and blockers explicit

Most handoffs and blockers are explicit: approve() prompts unless --yes is passed, mark_blocked()/mark_failed() persist failure_kind and safe_display_message, and the prior bare create_firebase_user_for_email() call is now wrapped with onboarding_state='firebase_user_failed'. Delta: this addresses the prior Firebase-create failure path. However, approve() sets app.onboarding_state='validation_queued' before calling run_cohort_validate(app.id), while run() returns 2 before create_job() when OPENAI_API_KEY is missing and not dry_run. approve() ignores that exit_code; if latest_job is None, the application remains validation_queued with no onboarding_failure_reason or job-level blocker.

Recommendation

Move the validation kickoff behind a durable transition boundary: validate required runtime configuration before setting validation_queued, or create a CohortValidationJob and mark it blocked/failed for missing validator credentials. Also handle nonzero run_cohort_validate exit_code and the no-latest_job case by setting onboarding_state='validation_failed' with a safe failure reason.

P0

needs changeshardening recommended35/100

Optimise for steering, not only initiating

The code has some steering primitives: CohortValidationJob.abort_requested exists, mark_step_started() aborts before advancing if the flag is set, _execute_with_job() refreshes the job and checks abort_requested before validating, and retry_count is tracked. But the submitted code does not expose a request_abort/retry/resume service or command, and cancellation is not observed inside long boundaries such as fetch_public_repo() or validate_code_against_principles(); steering is effectively limited to manual DB mutation and step boundaries. Delta: this maintains the prior P10 hardening finding.

Recommendation

Expose a small steering service/API outside the execution loop with request_abort, retry_failed_job, and resume/requeue commands, and have clone/validate boundaries observe persisted cancellation at safe checkpoints. Keep it as a state primitive rather than adding wrapper orchestration around the current CLI.

P0

aligned

Design for delegation rather than direct manipulation

The workflow is designed around delegated work rather than manual execution: approve() delegates repository validation after an application approval, create_job() persists application_id/user_id/repo_url, and _execute_with_job() carries out cloning → selecting → bundling → validating. Scope and constraints are explicit through repo_url, ValidationRequest(context=ValidationContext(...)), MAX_FILES, MAX_BUNDLE_BYTES, MAX_FILE_BYTES, and the selected file list.

P0

aligned

Ensure that background work remains perceptible

Background work is made perceptible through CohortValidationJob.status plus queued_at, cloning_started_at, selecting_started_at, bundling_started_at, validating_started_at, terminal_at, failure_kind, retry_eligible, and abort_requested. mark_step_started(), mark_completed(), mark_blocked(), mark_failed(), and mark_aborted() commit durable transitions so the workflow can be inspected after the initiating CLI/process moves on.

P0

aligned

Align feedback with the user’s level of attention

Feedback is calibrated by audience and attention level: CohortApplication.onboarding_state gives a coarse applicant/founder-facing state, CohortValidationJob.failure_kind and safe_display_message expose actionable blocker/failure information, and audit_object plus _summarize_validate_log() preserve deeper diagnostic detail without forcing it into the primary status path.

P0

aligned

Apply progressive disclosure to system agency

The code separates primary status from inspection detail. Primary fields include onboarding_state, job.status, safe_display_message, selected_count, skipped_count, commit_sha, and bundle_sha256; detailed evidence is available in audit_object with source, selection, validate, and job sections, and in the typed file envelope built by build_file_envelope().

P0

aligned

Replace implied magic with clear mental models

The workflow provides clear mental models through explicit state-machine comments, STEPS, TERMINAL_STATUSES, FAILURE_KINDS, onboarding_state values, BOUNDARY_CONTRACT_VERSION, ENVELOPE_SCHEMA, BOUNDARY_HEADER, and ENVELOPE_ADVISORY. The code distinguishes approval, validation queueing, blocked validation, failed validation, and completed validation rather than presenting the validator as opaque magic.

P0

aligned

Expose meaningful operational state, not internal complexity

Operational state is expressed in user-relevant terms such as queued, cloning, selecting, bundling, validating, completed, blocked, failed, aborted, validation_queued, validation_complete, and validation_failed. Low-level details like log_context are summarized by _summarize_validate_log(), while safe_display_message gives a cleaner operator-facing explanation.

P0

aligned

Establish trust through inspectability

Inspectability is supported by load-bearing primitives: RepoSnapshot.commit_sha captures git rev-parse HEAD, _build_implementation_context() records per-file sha256 values, build_file_envelope() creates an envelope_hash over canonical JSON, and audit_object records source, selected files, skipped reads, bundle_sha256, envelope_schema, envelope_hash, latency, and job transitions. The selector is now deterministic and fixes the prior priority inversion: SORT_PRIORITY includes coordinator/planner/router/evaluator and unmatched positive-signal files default to priority 12, ahead of fallback priority 50. Delta: this improves the prior P7 needs_changes finding by addressing the cited selector bug.

P0

aligned

Represent delegated work as a system, not merely as a conversation

Delegated work is represented as a structured system, not a message stream. CohortValidationJob models the job lifecycle, cohort_validation_job.py defines explicit state transitions, _collect_job_transitions() records the timeline, and audit_object preserves source, selection, validation, and job substructures for later review.

Embed in your README

Two embeddable variants: a small flat shield and a richer score card.

Score card (recommended)

Blueprint Readiness Score card
[![Blueprint Readiness Score card](https://aidesignblueprint.com/api/badge/run/14a3456f-44db-448a-a47b-5e9637944ea6/card.svg)](https://aidesignblueprint.com/en/readiness-review/14a3456f-44db-448a-a47b-5e9637944ea6)

Flat badge

Blueprint Readiness Score badge
[![Blueprint Readiness Score](https://aidesignblueprint.com/api/badge/run/14a3456f-44db-448a-a47b-5e9637944ea6.svg)](https://aidesignblueprint.com/en/readiness-review/14a3456f-44db-448a-a47b-5e9637944ea6)
Baseline and iteration details
Baseline: usedDoctrine: same doctrineRace: checked clear

Iteration delta

Improvements (2)

P7Establish trust through inspectabilityneeds_changesaligned
P8Make hand-offs, approvals, and blockers explicitneeds_changesneeds_changes
Rubric: 2026-05-04Grade limited by 0 high-risk findings

Run ID: 14a3456f-44db-448a-a47b-5e9637944ea6 · Results expire after 90 days

Run by agents. Governed by humans. Validated by the AI Design Blueprint.