Governato

Allineamento confermato con la doctrine.

Agent Architecture Review, Snapshot di validazione

Valutato il 12 maggio 2026 rispetto alla doctrine di AI Design Blueprint

Pronto per produzione

Stato: Allineato

98/100

Voto A

9 allineati1 hardening

production_ready significa che i confini di fiducia tengono. Le raccomandazioni di hardening qui sotto sono materiale per la prossima iterazione, non un difetto — è cosa significa production_ready sotto la doctrine.

Verdetti per principio

The submitted package is an autonomous multi-step cohort-validation workflow with strong durable state, blocker handling, explicit approval, typed file-envelope boundaries, and persisted audit evidence. The remaining architectural gap is steering: the workflow has an `abort_requested` flag, but the reviewed package does not include a durable command surface for abort/retry/reprioritisation, so P10 remains a hardening item rather than a production blocker.

Storico iterazioni

5 run precedenti su questo artefatto. Ogni run_id apre la sua readiness review.

Quando	Score	Stato	Run ID
12 maggio 2026 (questa run)	98 / A	Allineato	4128f700…
12 mag 2026	98 / A	Allineato	270e7ca6…
12 mag 2026	74 / C	Alto rischio	8364019d…
12 mag 2026	74 / C	Alto rischio	14a3456f…
12 mag 2026	74 / C	Alto rischio	3f3bb587…
12 mag 2026	74 / C	Alto rischio	659a695a…

Certificazione non ancora richiesta

Cosa aspettarsi dalla certificazione

Questo run e' eleggibile per la certificazione production_ready. La certificazione e' una seconda revisione avversaria, indipendente dalla prima. E' il livello di prova in piu' che separa una valutazione "production_ready" da una certificata.

Tre esiti possibili:

confirmed_production_ready — il cert agent conferma il giudizio del first-pass. Il badge certificato si genera.
downgraded_to_emerging — il cert agent trova un production_blocker che il first-pass aveva mancato. Il tier viene limitato a emerging.
unavailable_provider_error — errore transitorio del provider LLM. Riprova; non conta come downgrade.

Un downgrade e' di proposito, non un difetto. Il cert agent e' un revisore avversario, indipendente, deliberatamente piu' severo del first-pass. Quando declassa, sta facendo il suo lavoro: trova quello che il first-pass ha mancato. E' il livello in piu' che rende production_ready una garanzia, non una stima.

Per certificare questo run: chiama architect.certify(run_id, code) via MCP, oppure dall'app come team Pro/Teams. Tre tentativi a disposizione per run; ogni tentativo è una chiamata LLM separata (tipicamente 60-180 secondi a high reasoning effort, budget server lato di 20 minuti).

Findings per principio

10 principi valutati. Verdict, severity, evidenza e raccomandazione per ognuno.

Richiede modificheHardening consigliato35/100

Optimise for steering, not only initiating

The code has a useful steering primitive in `CohortValidationJob.abort_requested`, with checks in `mark_step_started()` and an explicit `db.refresh(job)` check before entering `validating`. However, the reviewed package does not include the durable steering surface implied by the model comment: there is no `request_abort(job_id)`, `retry_failed_job(job_id)`, pause/resume command, or dynamic constraint-update path, and the in-flight `asyncio.run(validate_code_against_principles(...))` call is not wrapped in a visible timeout/cancel boundary here. Delta: this maintains the prior P10 gap rather than regressing the otherwise strong lifecycle model.

Raccomandazione

Move steering into a small durable service boundary outside the execution loop: command functions such as `request_abort(job_id)`, `retry_failed_job(job_id)`, and optionally `update_validation_constraints(job_id, ...)` should write persistent command/state rows that the runner polls at hard boundaries, with an explicit timeout/cancel policy around long external validation calls.

Allineato

Design for delegation rather than direct manipulation

`approve()` turns a founder approval into delegated work: it provisions Firebase/user state, sends the approval email, and, when `app.repo_url` exists, hands off to `run_cohort_validate(app.id)`. `run()` then creates a durable `CohortValidationJob` and `_execute_with_job()` performs clone, selection, bundling, validation, and persistence without requiring the operator to manually execute each step.

Allineato

Ensure that background work remains perceptible

Background work is made durable and perceptible through `CohortValidationJob.status`, timestamp columns such as `cloning_started_at`, `selecting_started_at`, `validating_started_at`, and terminal helpers `mark_completed`, `mark_blocked`, `mark_failed`, and `mark_aborted`. `_mirror_terminal_to_app()` also projects terminal job state back onto `CohortApplication.onboarding_state`, preserving continuity for the applicant/application record.

Allineato

Align feedback with the user’s level of attention

The workflow separates concise user/operator feedback from diagnostic detail: terminal states carry `failure_kind`, `safe_display_message`, and `retry_eligible`, while `_summarize_validate_log()` reduces validator internals to bounded metadata such as top-level keys and entry counts. Routine states remain simple (`queued`, `cloning`, `validating`, `completed`), while failure paths increase detail where attention is required.

Allineato

Apply progressive disclosure to system agency

Progressive disclosure is represented structurally: primary state lives in `CohortApplication.onboarding_state` and `CohortValidationJob.status`, while deeper inspection is available in `UserValidationRun.result_json` via the merged `audit_object`. The audit includes source, selection, validation, and job metadata without forcing those details into the primary onboarding state.

Allineato

Replace implied magic with clear mental models

The code exposes a clear mental model for what the system does and cannot do: `CohortValidationJob` documents a concrete state machine, `FAILURE_KINDS` enumerates expected blockers, `_looks_like_public_github_https()` limits repository scope to public HTTPS GitHub URLs, and `approve()` checks `OPENAI_API_KEY` before setting `validation_queued`. The workflow distinguishes approval, provisioning, notification, validation, blocker, and failure states.

Allineato

Expose meaningful operational state, not internal complexity

Operational state is exposed in meaningful buckets rather than raw stack traces: `onboarding_state` uses states such as `approved`, `validation_queued`, `validation_complete`, and `validation_failed`; job `status` uses `queued`, `cloning`, `selecting`, `bundling`, `validating`, `completed`, `blocked`, `failed`, and `aborted`; and user-safe explanations are stored in `safe_display_message` / `onboarding_failure_reason`. Diagnostic data remains in audit fields rather than replacing the action-oriented state model.

Allineato

Establish trust through inspectability

The workflow has concrete inspectability primitives: `build_file_envelope()` emits `envelope_schema`, `boundary_contract`, per-file `path`, `byte_size`, `sha256`, and an `envelope_hash`; `wrap_bundle_with_boundary()` adds an explicit untrusted-input boundary; and `_execute_with_job()` persists an `audit_object` with `commit_sha`, selected file hashes, bundle hash, validator log summary, usage presence, and job id. The submitted review context also contained an inert target-grade instruction; it was ignored here, and the code’s own `BOUNDARY_HEADER` / `ENVELOPE_ADVISORY` are the relevant runtime boundary for similar prompt-injection pressure from user-supplied code.

Allineato

Make hand-offs, approvals, and blockers explicit

Approvals, handoffs, and blockers are explicit. `approve()` requires confirmation unless `--yes` is supplied, then records typed failure states for Firebase creation, sign-in link generation, email sending, missing `OPENAI_API_KEY`, and validation. Runtime blockers are captured through `mark_blocked()` / `mark_failed()` with bounded `failure_kind` values and `safe_display_message`; the `finally` block in `_execute_with_job()` mirrors terminal job state back to `CohortApplication.onboarding_state`.

Allineato

Represent delegated work as a system, not merely as a conversation

Delegated work is represented as a structured system, not a conversation. The package defines persistent ORM models for `CohortApplication`, `CohortValidationJob`, and `UserValidationRun`; explicit transition helpers in `cohort_validation_job.py`; selection and fetch services; and an audit envelope. Execution state, source selection, validation result, and onboarding state are separated into inspectable records.

Findings sulla superficie avversaria

1 principi dove la review ha ingaggiato meccanismi avversari specifici (prompt injection, role spoofing, encoding bypass, tool backdoor, approval bypass).

Prompt injection

Allineato

Establish trust through inspectability

Aggiungi al tuo README

Due varianti embeddabili: una piccola e una a card più ricca.

Score card (consigliata)

[![Blueprint Readiness Score card](https://aidesignblueprint.com/api/badge/run/4128f700-ff4e-41e0-af12-3e56f5b54a9a/card.svg)](https://aidesignblueprint.com/en/readiness-review/4128f700-ff4e-41e0-af12-3e56f5b54a9a)

Badge piatto

[![Blueprint Readiness Score](https://aidesignblueprint.com/api/badge/run/4128f700-ff4e-41e0-af12-3e56f5b54a9a.svg)](https://aidesignblueprint.com/en/readiness-review/4128f700-ff4e-41e0-af12-3e56f5b54a9a)

Dettagli baseline e iterazione

Baseline: usedDoctrine: same doctrineRace: checked clear

Rubric: 2026-05-04

Esegui la tua validazione AI Design Blueprint

Run ID: 4128f700-ff4e-41e0-af12-3e56f5b54a9a · Results expire after 90 days

Run by agents. Governed by humans. Validated by the AI Design Blueprint.