Skip to main contentSkip to footer
Governed

Alignment confirmed with the doctrine.

Agent Architecture Review, Validation snapshot

Evaluated 12 May 2026 against the AI Design Blueprint doctrine

Production-ready

Status: Aligned

100/100

Grade A

10 aligned
Per-principle verdicts

The submission is an autonomous background validation/onboarding workflow with durable job state, explicit approval and blocker states, typed untrusted-code envelopes, audit metadata, retry paths, and abort steering. The current Iter14 changes address the prior P10 gap by adding a post-validation abort re-check before persistence/completion and preserving failed onboarding state for step-aware recovery.

Iteration history

5 prior runs on this artifact. Each run_id opens its own readiness review.

WhenScoreStatusRun ID
12 May 2026 (this run)100 / AAligned3ac16b20
12 May 202674 / CHigh Risk0e49f888
12 May 202674 / CHigh Risk4760459e
12 May 202674 / CHigh Risk9caf9385
12 May 202674 / CAligned093809b5
12 May 202698 / AAligned4128f700
Certified production-readyAttempt 1/3

The first-pass production_ready verdict is confirmed: the code shows durable job states, explicit terminal/blocked/aborted handling, retry paths, and no specific missed defect that would currently cause a real user silent wrong results, crash, or trust-boundary bypass.

Per-principle findings

10 principles evaluated. Verdict, severity, evidence and recommendation for each.

P0

aligned

Design for delegation rather than direct manipulation

Delegation is represented as assignment of work rather than manual execution: `approve()` records approval and hands off to `run_cohort_validate`, while `_execute_with_job()` autonomously performs cloning, selection, bundling, validation, audit construction, and persistence. Scope and constraints are explicit through `repo_url`, `ValidationContext(repository=namespace, files=...)`, `select_agentic_surface()`, byte/file limits, and job lifecycle fields. This maintains the prior aligned verdict.

P0

aligned

Ensure that background work remains perceptible

Background work remains perceptible through persistent `CohortValidationJob` records with `status`, `queued_at`, per-step timestamps (`cloning_started_at`, `selecting_started_at`, `bundling_started_at`, `validating_started_at`), `terminal_at`, `failure_kind`, `safe_display_message`, `retry_eligible`, and `abort_requested`. `_mirror_terminal_to_app()` also projects terminal job state back onto `CohortApplication.onboarding_state`. This maintains the prior aligned verdict.

P0

aligned

Align feedback with the user’s level of attention

Feedback is calibrated by separating routine progress (`queued`, `cloning`, `selecting`, `bundling`, `validating`) from intervention-worthy outcomes (`blocked`, `failed`, `aborted`) and by recording concise user/operator messages in `safe_display_message` and `onboarding_failure_reason`. Diagnostic detail is kept in the audit object and `_summarize_validate_log()` rather than pushed into primary state. This maintains the prior aligned verdict.

P0

aligned

Apply progressive disclosure to system agency

The workflow uses progressive disclosure: primary state is compact (`CohortApplication.onboarding_state`, `CohortValidationJob.status`, `failure_kind`), while deeper inspection is available in the persisted `audit_object` with `source`, `selection`, `validate`, and `job` sections. `selected_files`, `skipped_during_read`, `bundle_sha256`, `envelope_hash`, and summarized validation logs are available without overwhelming the main lifecycle state. This maintains the prior aligned verdict.

P0

aligned

Replace implied magic with clear mental models

The system exposes a clear mental model with named lifecycle states and failure categories: `STEPS`, `TERMINAL_STATUSES`, `FAILURE_KINDS`, onboarding states such as `firebase_user_failed`, `sign_in_link_failed`, `approval_email_failed`, `validation_queued`, and `validation_complete`, plus explicit comments documenting the uncancellable single LLM-call limitation in `_execute_with_job()`. The untrusted-code boundary is also made explicit through `BOUNDARY_HEADER`, `BOUNDARY_CONTRACT_VERSION`, and `ENVELOPE_ADVISORY`. This maintains the prior aligned verdict.

P0

aligned

Expose meaningful operational state, not internal complexity

Operational state is user/action relevant rather than raw internals: application-level state is mirrored to `validation_complete` or `validation_failed`, while job-level state uses meaningful labels like `queued`, `blocked`, `failed`, `aborted`, and `completed`. Low-level diagnostic data such as file hashes, commit SHA, selected/skipped files, and usage/log summaries is reserved for the audit payload instead of being the primary status surface. This maintains the prior aligned verdict.

P0

aligned

Establish trust through inspectability

Inspectability is supported by a typed file envelope and audit trail: `build_file_envelope()` canonicalizes files and computes `envelope_hash`; `wrap_bundle_with_boundary()` marks the bundle as untrusted input; `_execute_with_job()` records `commit_sha`, `selected_files` with content hashes, `bundle_sha256`, `skipped_during_read`, `latency_ms`, `log_signals`, `boundary_contract_version`, and `envelope_schema` into `result_dict['audit']`. This provides traceability from validation result back to source material and selection decisions. This maintains the prior aligned verdict.

P0

aligned

Make hand-offs, approvals, and blockers explicit

Approvals, handoffs, and blockers are explicit. `approve()` requires operator confirmation unless `--yes` is supplied, then performs discrete external handoffs to Firebase, sign-in-link generation, email, and validation, each with separate failure states (`firebase_user_failed`, `sign_in_link_failed`, `approval_email_failed`, `validation_failed`). Validation blockers are classified via `mark_blocked()` with concrete `failure_kind` values such as `invalid_repo_url`, `no_supported_language`, `selector_rejected`, and `read_failed`. This maintains the prior aligned verdict.

P0

aligned

Represent delegated work as a system, not merely as a conversation

Delegated work is represented as a structured system rather than a conversation: `CohortValidationJob` models the workflow, dependencies, step timestamps, terminal status, retry eligibility, abort intent, selected counts, bundle hash, commit SHA, and validation run linkage. `_execute_with_job()` advances this state machine across cloning, selecting, bundling, validating, persistence, completion, and failure handling. This maintains the prior aligned verdict.

P0

aligned

Optimise for steering, not only initiating

Steering primitives are now present for both validation and onboarding recovery. `request_abort()` persists `abort_requested=True`; `_execute_with_job()` checks for abort before validation and, in the Iter14 change, refreshes the job and re-checks `job.abort_requested` immediately after `asyncio.run(validate_code_against_principles(...))` returns and before creating `UserValidationRun` or calling `mark_completed()`, discarding validator output on abort. `retry_failed_validation_job()` requires the latest job to be terminal and `retry_eligible`, while `retry_onboarding_handoff()` preserves the failed state (`firebase_user_failed`, `sign_in_link_failed`, or `approval_email_failed`) and clears…

Embed in your README

Two embeddable variants: a small flat shield and a richer score card.

Score card (recommended)

Blueprint Readiness Score card
[![Blueprint Readiness Score card](https://aidesignblueprint.com/api/badge/run/3ac16b20-88b8-4448-a4f6-5aa738b2919b/card.svg)](https://aidesignblueprint.com/en/readiness-review/3ac16b20-88b8-4448-a4f6-5aa738b2919b)

Flat badge

Blueprint Readiness Score badge
[![Blueprint Readiness Score](https://aidesignblueprint.com/api/badge/run/3ac16b20-88b8-4448-a4f6-5aa738b2919b.svg)](https://aidesignblueprint.com/en/readiness-review/3ac16b20-88b8-4448-a4f6-5aa738b2919b)
Baseline and iteration details
Baseline: usedDoctrine: same doctrineRace: checked clear

Iteration delta

Improvements (1)

P10Optimise for steering, not only initiatingneeds_changesaligned
Rubric: 2026-05-04

Run ID: 3ac16b20-88b8-4448-a4f6-5aa738b2919b · Results expire after 90 days

Run by agents. Governed by humans. Validated by the AI Design Blueprint.