Governato

Allineamento confermato con la doctrine.

Agent Architecture Review, Snapshot di validazione

Valutato il 12 maggio 2026 rispetto alla doctrine di AI Design Blueprint

Emergente

Stato: Allineato

74/100

Voto C

9 allineati1 hardening

Verdetti per principio

The submission is an autonomous, multi-step cohort validation workflow with durable job state, explicit blocker/failure states, source/audit traceability, and a typed untrusted-code boundary before the LLM validation call. The main remaining gap is steering: `abort_requested` is modeled and checked at step boundaries, but there is no durable operator command surface for requesting abort/retry or changing constraints while the job is underway, and long-running clone/validation phases remain effectively unsteerable once entered.

Storico iterazioni

5 run precedenti su questo artefatto. Ogni run_id apre la sua readiness review.

Quando	Score	Stato	Run ID
12 maggio 2026 (questa run)	74 / C	Allineato	093809b5…
12 mag 2026	98 / A	Allineato	4128f700…
12 mag 2026	98 / A	Allineato	270e7ca6…
12 mag 2026	74 / C	Alto rischio	8364019d…
12 mag 2026	74 / C	Alto rischio	14a3456f…
12 mag 2026	74 / C	Alto rischio	3f3bb587…

Cert downgraded to emergingTentativo 1/3

Missed production blocker: `backend/app/db/models.py` references `Base` at class-definition time without importing it, so importing `app.db.models` crashes before either cohort CLI can run.

Il downgrade e' di proposito. Il cert agent e' un secondo revisore avversario, indipendente, deliberatamente piu' severo del first-pass. Ha trovato un production_blocker che il first-pass aveva mancato. Risolvi il finding qui sotto, ri-valida per ottenere un nuovo run_id, e ricertifica per il badge.

Blocker mancati dal first-pass

Richiede modificheBlocker di produzione95/100

Model module crashes at import because `Base` is undefined

In `backend/app/db/models.py`, classes are declared as `class User(Base):`, `class CohortApplication(Base):`, `class CohortValidationJob(Base):`, and `class UserValidationRun(Base):`, but the file only imports `datetime`, SQLAlchemy column types, and `Mapped/mapped_column`; it never imports `Base` from `app.db.session`. Both `run_cohort_validate.py` and `approve_cohort_application.py` execute `from app.db import models`, which would evaluate these class definitions and raise `NameError: name 'Base' is not defined` at module load time.

Raccomandazione

Add an explicit `from app.db.session import Base` import in `backend/app/db/models.py` and include an import smoke test for `app.db.models`, `app.scripts.run_cohort_validate`, and `app.scripts.approve_cohort_application`.

Findings per principio

10 principi valutati. Verdict, severity, evidenza e raccomandazione per ognuno.

Richiede modificheHardening consigliato35/100

Optimise for steering, not only initiating

The code models steering intent with `CohortValidationJob.abort_requested` and checks it in `mark_step_started()` plus once before `validating`, but there is no reviewed durable command surface such as `request_abort(job_id)` or `retry_failed_job(job_id)` to set that flag or initiate recovery. Once inside `fetch_public_repo()` or `asyncio.run(validate_code_against_principles(...))`, the workflow is effectively unsteerable except by killing the process; constraints also cannot be updated mid-run. Delta: this maintains the prior P10 `needs_changes` finding; the newly included stubs do not address the prior recommendation to move steering into a small durable service boundary.

Raccomandazione

Move steering into a small durable service boundary outside the execution loop: expose command functions such as `request_abort(job_id)`, `retry_failed_job(job_id)`, and, if needed, `update_validation_constraints(job_id, constraints)`, then have the worker poll those persisted commands before and during long-running phases where cancellation is feasible.

Allineato

Design for delegation rather than direct manipulation

Delegation is represented as an assigned workflow rather than manual step execution: `approve()` gates the cohort approval, then `run_cohort_validate.run()` creates a `CohortValidationJob` and `_execute_with_job()` performs clone, language detection, surface selection, bundling, validation, and persistence. Scope is bounded by `repo_url`, `ValidationContext(repository=namespace, files=list(selection.selected_paths))`, `MAX_FILES`, `MAX_BUNDLE_BYTES`, and selected file paths.

Allineato

Ensure that background work remains perceptible

Background progress is durable and perceptible through `CohortValidationJob.status`, `queued_at`, `cloning_started_at`, `selecting_started_at`, `bundling_started_at`, `validating_started_at`, `terminal_at`, `failure_kind`, `safe_display_message`, and `retry_eligible`. `_mirror_terminal_to_app()` also mirrors terminal job outcomes into `CohortApplication.onboarding_state`, including the `finally` path in `_execute_with_job()`, so users/operators can leave and return without losing continuity.

Allineato

Align feedback with the user’s level of attention

Feedback is calibrated by separating concise user/operator states from diagnostic detail: routine progress uses `validation_queued`, `validation_complete`, and `validation_failed`; intervention cases use `safe_display_message`, `onboarding_failure_reason`, and typed failure states such as `firebase_user_failed`, `sign_in_link_failed`, `approval_email_failed`, and `validate_error`. Detailed validator metadata is summarized by `_summarize_validate_log()` instead of exposing raw logs as primary feedback.

Allineato

Apply progressive disclosure to system agency

The workflow exposes minimal primary state via `onboarding_state` and job `status`, while preserving deeper inspection detail in the persisted `audit_object`. The audit separates `source`, `selection`, `validate`, and `job` details, including selected file hashes, skipped reads, bundle hash, latency, and log-signal summaries, which supports progressive disclosure rather than dumping internal complexity into the main state.

Allineato

Replace implied magic with clear mental models

The code gives a clear mental model for what the system can and cannot do: `fetch_public_repo()` accepts only public HTTPS GitHub URLs via `_GITHUB_URL_RE`; `_detect_language()` only supports Python and TypeScript; `SelectionRejected`, `FAILURE_KINDS`, and messages like `No Py/TS files.` and `URL not public HTTPS GitHub.` make limits explicit. The LLM input is wrapped with `BOUNDARY_HEADER`, `ENVELOPE_SCHEMA`, and `ENVELOPE_ADVISORY`, making the untrusted-code boundary explicit.

Allineato

Expose meaningful operational state, not internal complexity

Operational state is expressed in actionable categories rather than raw implementation details: job states are `queued`, step names, and terminal states such as `completed`, `blocked`, `failed`, and `aborted`; application-level state is mapped to `validation_queued`, `validation_complete`, or `validation_failed`. Low-level details such as `commit_sha`, `bundle_sha256`, `selected_count`, and validator log summaries are retained for diagnostics without becoming the primary user-facing state.

Allineato

Establish trust through inspectability

Inspectability is supported by concrete audit primitives: `build_file_envelope()` creates a typed envelope with `envelope_schema`, `boundary_contract`, per-file `sha256`, and an `envelope_hash`; `audit_object` records `repo_url`, `commit_sha`, selected file hashes, skipped reads, bundle byte counts, `bundle_sha256`, validation latency, boundary contract version, and envelope hash. These identifiers allow a reviewer to trace what source material was sent to validation and how the result was produced.

Allineato

Make hand-offs, approvals, and blockers explicit

Approvals, hand-offs, and blockers are explicit. `approve()` requires an interactive confirmation unless `--yes` is supplied, catches Firebase/sign-in/email failures into typed `onboarding_state` values, and checks `OPENAI_API_KEY` before setting `validation_queued`. `_execute_with_job()` maps invalid repos, clone failures, selector rejection, read failures, validation failures, and persistence failures into `mark_blocked()` or `mark_failed()` with `failure_kind`, `safe_display_message`, and `retry_eligible`; the `finally` block mirrors terminal status back to the application.

Allineato

Represent delegated work as a system, not merely as a conversation

Delegated work is modeled as a structured system, not a chat transcript. `CohortValidationJob` stores lifecycle, ownership, retry, abort, commit, bundle, and selection metadata; `UserValidationRun` stores the validator result separately; `_execute_with_job()` advances through explicit phases (`cloning`, `selecting`, `bundling`, `validating`) and persists terminal outcomes. The workflow therefore has a governable state machine and audit trail.

Aggiungi al tuo README

Due varianti embeddabili: una piccola e una a card più ricca.

Score card (consigliata)

[![Blueprint Readiness Score card](https://aidesignblueprint.com/api/badge/run/093809b5-b30a-4b23-b02c-30a308ee7dea/card.svg)](https://aidesignblueprint.com/en/readiness-review/093809b5-b30a-4b23-b02c-30a308ee7dea)

Badge piatto

[![Blueprint Readiness Score](https://aidesignblueprint.com/api/badge/run/093809b5-b30a-4b23-b02c-30a308ee7dea.svg)](https://aidesignblueprint.com/en/readiness-review/093809b5-b30a-4b23-b02c-30a308ee7dea)

Dettagli baseline e iterazione

Baseline: usedDoctrine: same doctrineRace: checked clear

Rubric: 2026-05-04

Esegui la tua validazione AI Design Blueprint

Run ID: 093809b5-b30a-4b23-b02c-30a308ee7dea · Results expire after 90 days

Run by agents. Governed by humans. Validated by the AI Design Blueprint.