Allineamento confermato con la doctrine.
Valutato il 12 maggio 2026 rispetto alla doctrine di AI Design Blueprint
Emergente
Stato: Allineato
74/100
Voto C
The submission is an autonomous, multi-step cohort validation workflow with durable job state, explicit blocker/failure states, source/audit traceability, and a typed untrusted-code boundary before the LLM validation call. The main remaining gap is steering: `abort_requested` is modeled and checked at step boundaries, but there is no durable operator command surface for requesting abort/retry or changing constraints while the job is underway, and long-running clone/validation phases remain effectively unsteerable once entered.
Storico iterazioni
5 run precedenti su questo artefatto. Ogni run_id apre la sua readiness review.
Missed production blocker: `backend/app/db/models.py` references `Base` at class-definition time without importing it, so importing `app.db.models` crashes before either cohort CLI can run.
Il downgrade e' di proposito. Il cert agent e' un secondo revisore avversario, indipendente, deliberatamente piu' severo del first-pass. Ha trovato un production_blocker che il first-pass aveva mancato. Risolvi il finding qui sotto, ri-valida per ottenere un nuovo run_id, e ricertifica per il badge.
Blocker mancati dal first-pass
P8
Richiede modificheBlocker di produzione95/100Model module crashes at import because `Base` is undefined
In `backend/app/db/models.py`, classes are declared as `class User(Base):`, `class CohortApplication(Base):`, `class CohortValidationJob(Base):`, and `class UserValidationRun(Base):`, but the file only imports `datetime`, SQLAlchemy column types, and `Mapped/mapped_column`; it never imports `Base` from `app.db.session`. Both `run_cohort_validate.py` and `approve_cohort_application.py` execute `from app.db import models`, which would evaluate these class definitions and raise `NameError: name 'Base' is not defined` at module load time.
Raccomandazione
Add an explicit `from app.db.session import Base` import in `backend/app/db/models.py` and include an import smoke test for `app.db.models`, `app.scripts.run_cohort_validate`, and `app.scripts.approve_cohort_application`.
Findings per principio
10 principi valutati. Verdict, severity, evidenza e raccomandazione per ognuno.
P0
Richiede modificheHardening consigliato35/100Optimise for steering, not only initiating
The code models steering intent with `CohortValidationJob.abort_requested` and checks it in `mark_step_started()` plus once before `validating`, but there is no reviewed durable command surface such as `request_abort(job_id)` or `retry_failed_job(job_id)` to set that flag or initiate recovery. Once inside `fetch_public_repo()` or `asyncio.run(validate_code_against_principles(...))`, the workflow is effectively unsteerable except by killing the process; constraints also cannot be updated mid-run. Delta: this maintains the prior P10 `needs_changes` finding; the newly included stubs do not address the prior recommendation to move steering into a small durable service boundary.
Raccomandazione
Move steering into a small durable service boundary outside the execution loop: expose command functions such as `request_abort(job_id)`, `retry_failed_job(job_id)`, and, if needed, `update_validation_constraints(job_id, constraints)`, then have the worker poll those persisted commands before and during long-running phases where cancellation is feasible.
P0
AllineatoDesign for delegation rather than direct manipulation
Delegation is represented as an assigned workflow rather than manual step execution: `approve()` gates the cohort approval, then `run_cohort_validate.run()` creates a `CohortValidationJob` and `_execute_with_job()` performs clone, language detection, surface selection, bundling, validation, and persistence. Scope is bounded by `repo_url`, `ValidationContext(repository=namespace, files=list(selection.selected_paths))`, `MAX_FILES`, `MAX_BUNDLE_BYTES`, and selected file paths.
P0
AllineatoEnsure that background work remains perceptible
Background progress is durable and perceptible through `CohortValidationJob.status`, `queued_at`, `cloning_started_at`, `selecting_started_at`, `bundling_started_at`, `validating_started_at`, `terminal_at`, `failure_kind`, `safe_display_message`, and `retry_eligible`. `_mirror_terminal_to_app()` also mirrors terminal job outcomes into `CohortApplication.onboarding_state`, including the `finally` path in `_execute_with_job()`, so users/operators can leave and return without losing continuity.
P0
AllineatoAlign feedback with the user’s level of attention
Feedback is calibrated by separating concise user/operator states from diagnostic detail: routine progress uses `validation_queued`, `validation_complete`, and `validation_failed`; intervention cases use `safe_display_message`, `onboarding_failure_reason`, and typed failure states such as `firebase_user_failed`, `sign_in_link_failed`, `approval_email_failed`, and `validate_error`. Detailed validator metadata is summarized by `_summarize_validate_log()` instead of exposing raw logs as primary feedback.
P0
AllineatoApply progressive disclosure to system agency
The workflow exposes minimal primary state via `onboarding_state` and job `status`, while preserving deeper inspection detail in the persisted `audit_object`. The audit separates `source`, `selection`, `validate`, and `job` details, including selected file hashes, skipped reads, bundle hash, latency, and log-signal summaries, which supports progressive disclosure rather than dumping internal complexity into the main state.
P0
AllineatoReplace implied magic with clear mental models
The code gives a clear mental model for what the system can and cannot do: `fetch_public_repo()` accepts only public HTTPS GitHub URLs via `_GITHUB_URL_RE`; `_detect_language()` only supports Python and TypeScript; `SelectionRejected`, `FAILURE_KINDS`, and messages like `No Py/TS files.` and `URL not public HTTPS GitHub.` make limits explicit. The LLM input is wrapped with `BOUNDARY_HEADER`, `ENVELOPE_SCHEMA`, and `ENVELOPE_ADVISORY`, making the untrusted-code boundary explicit.
P0
AllineatoExpose meaningful operational state, not internal complexity
Operational state is expressed in actionable categories rather than raw implementation details: job states are `queued`, step names, and terminal states such as `completed`, `blocked`, `failed`, and `aborted`; application-level state is mapped to `validation_queued`, `validation_complete`, or `validation_failed`. Low-level details such as `commit_sha`, `bundle_sha256`, `selected_count`, and validator log summaries are retained for diagnostics without becoming the primary user-facing state.
P0
AllineatoEstablish trust through inspectability
Inspectability is supported by concrete audit primitives: `build_file_envelope()` creates a typed envelope with `envelope_schema`, `boundary_contract`, per-file `sha256`, and an `envelope_hash`; `audit_object` records `repo_url`, `commit_sha`, selected file hashes, skipped reads, bundle byte counts, `bundle_sha256`, validation latency, boundary contract version, and envelope hash. These identifiers allow a reviewer to trace what source material was sent to validation and how the result was produced.
P0
AllineatoMake hand-offs, approvals, and blockers explicit
Approvals, hand-offs, and blockers are explicit. `approve()` requires an interactive confirmation unless `--yes` is supplied, catches Firebase/sign-in/email failures into typed `onboarding_state` values, and checks `OPENAI_API_KEY` before setting `validation_queued`. `_execute_with_job()` maps invalid repos, clone failures, selector rejection, read failures, validation failures, and persistence failures into `mark_blocked()` or `mark_failed()` with `failure_kind`, `safe_display_message`, and `retry_eligible`; the `finally` block mirrors terminal status back to the application.
P0
AllineatoRepresent delegated work as a system, not merely as a conversation
Delegated work is modeled as a structured system, not a chat transcript. `CohortValidationJob` stores lifecycle, ownership, retry, abort, commit, bundle, and selection metadata; `UserValidationRun` stores the validator result separately; `_execute_with_job()` advances through explicit phases (`cloning`, `selecting`, `bundling`, `validating`) and persists terminal outcomes. The workflow therefore has a governable state machine and audit trail.
Aggiungi al tuo README
Due varianti embeddabili: una piccola e una a card più ricca.
Score card (consigliata)
[](https://aidesignblueprint.com/en/readiness-review/093809b5-b30a-4b23-b02c-30a308ee7dea)
Badge piatto
[](https://aidesignblueprint.com/en/readiness-review/093809b5-b30a-4b23-b02c-30a308ee7dea)
Run ID: 093809b5-b30a-4b23-b02c-30a308ee7dea · Results expire after 90 days
Run by agents. Governed by humans. Validated by the AI Design Blueprint.