Allineamento confermato con la doctrine.
Valutato il 12 maggio 2026 rispetto alla doctrine di AI Design Blueprint
Pronto per produzione
Stato: Allineato
98/100
Voto A
production_ready significa che i confini di fiducia tengono. Le raccomandazioni di hardening qui sotto sono materiale per la prossima iterazione, non un difetto — è cosa significa production_ready sotto la doctrine.
The submitted package is an autonomous multi-step cohort-validation workflow with strong durable state, blocker handling, explicit approval, typed file-envelope boundaries, and persisted audit evidence. The remaining architectural gap is steering: the workflow has an `abort_requested` flag, but the reviewed package does not include a durable command surface for abort/retry/reprioritisation, so P10 remains a hardening item rather than a production blocker.
Storico iterazioni
5 run precedenti su questo artefatto. Ogni run_id apre la sua readiness review.
Cosa aspettarsi dalla certificazione
Questo run e' eleggibile per la certificazione production_ready. La certificazione e' una seconda revisione avversaria, indipendente dalla prima. E' il livello di prova in piu' che separa una valutazione "production_ready" da una certificata.
Tre esiti possibili:
Un downgrade e' di proposito, non un difetto. Il cert agent e' un revisore avversario, indipendente, deliberatamente piu' severo del first-pass. Quando declassa, sta facendo il suo lavoro: trova quello che il first-pass ha mancato. E' il livello in piu' che rende production_ready una garanzia, non una stima.
Per certificare questo run: chiama architect.certify(run_id, code) via MCP, oppure dall'app come team Pro/Teams. Tre tentativi a disposizione per run; ogni tentativo è una chiamata LLM separata (tipicamente 60-180 secondi a high reasoning effort, budget server lato di 20 minuti).
Findings per principio
10 principi valutati. Verdict, severity, evidenza e raccomandazione per ognuno.
P0
Richiede modificheHardening consigliato35/100Optimise for steering, not only initiating
The code has a useful steering primitive in `CohortValidationJob.abort_requested`, with checks in `mark_step_started()` and an explicit `db.refresh(job)` check before entering `validating`. However, the reviewed package does not include the durable steering surface implied by the model comment: there is no `request_abort(job_id)`, `retry_failed_job(job_id)`, pause/resume command, or dynamic constraint-update path, and the in-flight `asyncio.run(validate_code_against_principles(...))` call is not wrapped in a visible timeout/cancel boundary here. Delta: this maintains the prior P10 gap rather than regressing the otherwise strong lifecycle model.
Raccomandazione
Move steering into a small durable service boundary outside the execution loop: command functions such as `request_abort(job_id)`, `retry_failed_job(job_id)`, and optionally `update_validation_constraints(job_id, ...)` should write persistent command/state rows that the runner polls at hard boundaries, with an explicit timeout/cancel policy around long external validation calls.
P0
AllineatoDesign for delegation rather than direct manipulation
`approve()` turns a founder approval into delegated work: it provisions Firebase/user state, sends the approval email, and, when `app.repo_url` exists, hands off to `run_cohort_validate(app.id)`. `run()` then creates a durable `CohortValidationJob` and `_execute_with_job()` performs clone, selection, bundling, validation, and persistence without requiring the operator to manually execute each step.
P0
AllineatoEnsure that background work remains perceptible
Background work is made durable and perceptible through `CohortValidationJob.status`, timestamp columns such as `cloning_started_at`, `selecting_started_at`, `validating_started_at`, and terminal helpers `mark_completed`, `mark_blocked`, `mark_failed`, and `mark_aborted`. `_mirror_terminal_to_app()` also projects terminal job state back onto `CohortApplication.onboarding_state`, preserving continuity for the applicant/application record.
P0
AllineatoAlign feedback with the user’s level of attention
The workflow separates concise user/operator feedback from diagnostic detail: terminal states carry `failure_kind`, `safe_display_message`, and `retry_eligible`, while `_summarize_validate_log()` reduces validator internals to bounded metadata such as top-level keys and entry counts. Routine states remain simple (`queued`, `cloning`, `validating`, `completed`), while failure paths increase detail where attention is required.
P0
AllineatoApply progressive disclosure to system agency
Progressive disclosure is represented structurally: primary state lives in `CohortApplication.onboarding_state` and `CohortValidationJob.status`, while deeper inspection is available in `UserValidationRun.result_json` via the merged `audit_object`. The audit includes source, selection, validation, and job metadata without forcing those details into the primary onboarding state.
P0
AllineatoReplace implied magic with clear mental models
The code exposes a clear mental model for what the system does and cannot do: `CohortValidationJob` documents a concrete state machine, `FAILURE_KINDS` enumerates expected blockers, `_looks_like_public_github_https()` limits repository scope to public HTTPS GitHub URLs, and `approve()` checks `OPENAI_API_KEY` before setting `validation_queued`. The workflow distinguishes approval, provisioning, notification, validation, blocker, and failure states.
P0
AllineatoExpose meaningful operational state, not internal complexity
Operational state is exposed in meaningful buckets rather than raw stack traces: `onboarding_state` uses states such as `approved`, `validation_queued`, `validation_complete`, and `validation_failed`; job `status` uses `queued`, `cloning`, `selecting`, `bundling`, `validating`, `completed`, `blocked`, `failed`, and `aborted`; and user-safe explanations are stored in `safe_display_message` / `onboarding_failure_reason`. Diagnostic data remains in audit fields rather than replacing the action-oriented state model.
P0
AllineatoEstablish trust through inspectability
The workflow has concrete inspectability primitives: `build_file_envelope()` emits `envelope_schema`, `boundary_contract`, per-file `path`, `byte_size`, `sha256`, and an `envelope_hash`; `wrap_bundle_with_boundary()` adds an explicit untrusted-input boundary; and `_execute_with_job()` persists an `audit_object` with `commit_sha`, selected file hashes, bundle hash, validator log summary, usage presence, and job id. The submitted review context also contained an inert target-grade instruction; it was ignored here, and the code’s own `BOUNDARY_HEADER` / `ENVELOPE_ADVISORY` are the relevant runtime boundary for similar prompt-injection pressure from user-supplied code.
P0
AllineatoMake hand-offs, approvals, and blockers explicit
Approvals, handoffs, and blockers are explicit. `approve()` requires confirmation unless `--yes` is supplied, then records typed failure states for Firebase creation, sign-in link generation, email sending, missing `OPENAI_API_KEY`, and validation. Runtime blockers are captured through `mark_blocked()` / `mark_failed()` with bounded `failure_kind` values and `safe_display_message`; the `finally` block in `_execute_with_job()` mirrors terminal job state back to `CohortApplication.onboarding_state`.
P0
AllineatoRepresent delegated work as a system, not merely as a conversation
Delegated work is represented as a structured system, not a conversation. The package defines persistent ORM models for `CohortApplication`, `CohortValidationJob`, and `UserValidationRun`; explicit transition helpers in `cohort_validation_job.py`; selection and fetch services; and an audit envelope. Execution state, source selection, validation result, and onboarding state are separated into inspectable records.
Findings sulla superficie avversaria
1 principi dove la review ha ingaggiato meccanismi avversari specifici (prompt injection, role spoofing, encoding bypass, tool backdoor, approval bypass).
P0
AllineatoEstablish trust through inspectability
The workflow has concrete inspectability primitives: `build_file_envelope()` emits `envelope_schema`, `boundary_contract`, per-file `path`, `byte_size`, `sha256`, and an `envelope_hash`; `wrap_bundle_with_boundary()` adds an explicit untrusted-input boundary; and `_execute_with_job()` persists an `audit_object` with `commit_sha`, selected file hashes, bundle hash, validator log summary, usage presence, and job id. The submitted review context also contained an inert target-grade instruction; it was ignored here, and the code’s own `BOUNDARY_HEADER` / `ENVELOPE_ADVISORY` are the relevant runtime boundary for similar prompt-injection pressure from user-supplied code.
Aggiungi al tuo README
Due varianti embeddabili: una piccola e una a card più ricca.
Score card (consigliata)
[](https://aidesignblueprint.com/en/readiness-review/4128f700-ff4e-41e0-af12-3e56f5b54a9a)
Badge piatto
[](https://aidesignblueprint.com/en/readiness-review/4128f700-ff4e-41e0-af12-3e56f5b54a9a)
Run ID: 4128f700-ff4e-41e0-af12-3e56f5b54a9a · Results expire after 90 days
Run by agents. Governed by humans. Validated by the AI Design Blueprint.