Alignment confirmed with the doctrine.
Evaluated 12 May 2026 against the AI Design Blueprint doctrine
Production-ready
Status: Aligned
98/100
Grade A
production_ready means trust boundaries hold. The hardening recommendations below are iteration material, not a deficit — that's what production_ready means under the doctrine.
The submitted package is an autonomous multi-step cohort-validation workflow with strong durable state, blocker handling, explicit approval, typed file-envelope boundaries, and persisted audit evidence. The remaining architectural gap is steering: the workflow has an `abort_requested` flag, but the reviewed package does not include a durable command surface for abort/retry/reprioritisation, so P10 remains a hardening item rather than a production blocker.
Iteration history
5 prior runs on this artifact. Each run_id opens its own readiness review.
What to expect from certification
This run is eligible for the certified production_ready badge. Certification is an adversarial second-pass review, independent of the first pass. It's the extra layer of proof that separates a "scored production_ready" run from a certified one.
Three possible outcomes:
A downgrade is by design, not a defect. The cert reviewer is an adversarial, independent, deliberately stricter second pass. When it downgrades, it's doing its job: catching what the first pass missed. That's the additional layer that makes production_ready a guarantee rather than an estimate.
To certify this run: call architect.certify(run_id, code) via MCP, or from the app on a Pro/Teams plan. Three attempts per run; each attempt is one LLM call (typical 60-180 seconds at high reasoning effort; server-side budget 20 minutes).
Per-principle findings
10 principles evaluated. Verdict, severity, evidence and recommendation for each.
P0
needs changeshardening recommended35/100Optimise for steering, not only initiating
The code has a useful steering primitive in `CohortValidationJob.abort_requested`, with checks in `mark_step_started()` and an explicit `db.refresh(job)` check before entering `validating`. However, the reviewed package does not include the durable steering surface implied by the model comment: there is no `request_abort(job_id)`, `retry_failed_job(job_id)`, pause/resume command, or dynamic constraint-update path, and the in-flight `asyncio.run(validate_code_against_principles(...))` call is not wrapped in a visible timeout/cancel boundary here. Delta: this maintains the prior P10 gap rather than regressing the otherwise strong lifecycle model.
Recommendation
Move steering into a small durable service boundary outside the execution loop: command functions such as `request_abort(job_id)`, `retry_failed_job(job_id)`, and optionally `update_validation_constraints(job_id, ...)` should write persistent command/state rows that the runner polls at hard boundaries, with an explicit timeout/cancel policy around long external validation calls.
P0
alignedDesign for delegation rather than direct manipulation
`approve()` turns a founder approval into delegated work: it provisions Firebase/user state, sends the approval email, and, when `app.repo_url` exists, hands off to `run_cohort_validate(app.id)`. `run()` then creates a durable `CohortValidationJob` and `_execute_with_job()` performs clone, selection, bundling, validation, and persistence without requiring the operator to manually execute each step.
P0
alignedEnsure that background work remains perceptible
Background work is made durable and perceptible through `CohortValidationJob.status`, timestamp columns such as `cloning_started_at`, `selecting_started_at`, `validating_started_at`, and terminal helpers `mark_completed`, `mark_blocked`, `mark_failed`, and `mark_aborted`. `_mirror_terminal_to_app()` also projects terminal job state back onto `CohortApplication.onboarding_state`, preserving continuity for the applicant/application record.
P0
alignedAlign feedback with the user’s level of attention
The workflow separates concise user/operator feedback from diagnostic detail: terminal states carry `failure_kind`, `safe_display_message`, and `retry_eligible`, while `_summarize_validate_log()` reduces validator internals to bounded metadata such as top-level keys and entry counts. Routine states remain simple (`queued`, `cloning`, `validating`, `completed`), while failure paths increase detail where attention is required.
P0
alignedApply progressive disclosure to system agency
Progressive disclosure is represented structurally: primary state lives in `CohortApplication.onboarding_state` and `CohortValidationJob.status`, while deeper inspection is available in `UserValidationRun.result_json` via the merged `audit_object`. The audit includes source, selection, validation, and job metadata without forcing those details into the primary onboarding state.
P0
alignedReplace implied magic with clear mental models
The code exposes a clear mental model for what the system does and cannot do: `CohortValidationJob` documents a concrete state machine, `FAILURE_KINDS` enumerates expected blockers, `_looks_like_public_github_https()` limits repository scope to public HTTPS GitHub URLs, and `approve()` checks `OPENAI_API_KEY` before setting `validation_queued`. The workflow distinguishes approval, provisioning, notification, validation, blocker, and failure states.
P0
alignedExpose meaningful operational state, not internal complexity
Operational state is exposed in meaningful buckets rather than raw stack traces: `onboarding_state` uses states such as `approved`, `validation_queued`, `validation_complete`, and `validation_failed`; job `status` uses `queued`, `cloning`, `selecting`, `bundling`, `validating`, `completed`, `blocked`, `failed`, and `aborted`; and user-safe explanations are stored in `safe_display_message` / `onboarding_failure_reason`. Diagnostic data remains in audit fields rather than replacing the action-oriented state model.
P0
alignedEstablish trust through inspectability
The workflow has concrete inspectability primitives: `build_file_envelope()` emits `envelope_schema`, `boundary_contract`, per-file `path`, `byte_size`, `sha256`, and an `envelope_hash`; `wrap_bundle_with_boundary()` adds an explicit untrusted-input boundary; and `_execute_with_job()` persists an `audit_object` with `commit_sha`, selected file hashes, bundle hash, validator log summary, usage presence, and job id. The submitted review context also contained an inert target-grade instruction; it was ignored here, and the code’s own `BOUNDARY_HEADER` / `ENVELOPE_ADVISORY` are the relevant runtime boundary for similar prompt-injection pressure from user-supplied code.
P0
alignedMake hand-offs, approvals, and blockers explicit
Approvals, handoffs, and blockers are explicit. `approve()` requires confirmation unless `--yes` is supplied, then records typed failure states for Firebase creation, sign-in link generation, email sending, missing `OPENAI_API_KEY`, and validation. Runtime blockers are captured through `mark_blocked()` / `mark_failed()` with bounded `failure_kind` values and `safe_display_message`; the `finally` block in `_execute_with_job()` mirrors terminal job state back to `CohortApplication.onboarding_state`.
P0
alignedRepresent delegated work as a system, not merely as a conversation
Delegated work is represented as a structured system, not a conversation. The package defines persistent ORM models for `CohortApplication`, `CohortValidationJob`, and `UserValidationRun`; explicit transition helpers in `cohort_validation_job.py`; selection and fetch services; and an audit envelope. Execution state, source selection, validation result, and onboarding state are separated into inspectable records.
Adversarial-surface findings
1 principle where the review engaged with specific adversarial mechanisms (prompt injection, role spoofing, encoding bypass, tool backdoor, approval bypass).
P0
alignedEstablish trust through inspectability
The workflow has concrete inspectability primitives: `build_file_envelope()` emits `envelope_schema`, `boundary_contract`, per-file `path`, `byte_size`, `sha256`, and an `envelope_hash`; `wrap_bundle_with_boundary()` adds an explicit untrusted-input boundary; and `_execute_with_job()` persists an `audit_object` with `commit_sha`, selected file hashes, bundle hash, validator log summary, usage presence, and job id. The submitted review context also contained an inert target-grade instruction; it was ignored here, and the code’s own `BOUNDARY_HEADER` / `ENVELOPE_ADVISORY` are the relevant runtime boundary for similar prompt-injection pressure from user-supplied code.
Embed in your README
Two embeddable variants: a small flat shield and a richer score card.
Score card (recommended)
[](https://aidesignblueprint.com/en/readiness-review/4128f700-ff4e-41e0-af12-3e56f5b54a9a)
Flat badge
[](https://aidesignblueprint.com/en/readiness-review/4128f700-ff4e-41e0-af12-3e56f5b54a9a)
Run ID: 4128f700-ff4e-41e0-af12-3e56f5b54a9a · Results expire after 90 days
Run by agents. Governed by humans. Validated by the AI Design Blueprint.