Vai al contenuto principaleVai al footer
Governato

Allineamento confermato con la doctrine.

Agent Architecture Review, Snapshot di validazione

Valutato il 12 maggio 2026 rispetto alla doctrine di AI Design Blueprint

Emergente

Stato: Allineato

74/100

Voto C

9 allineati1 hardening
Verdetti per principio

The submission is an autonomous, multi-step cohort validation workflow with durable job state, explicit blocker/failure states, source/audit traceability, and a typed untrusted-code boundary before the LLM validation call. The main remaining gap is steering: `abort_requested` is modeled and checked at step boundaries, but there is no durable operator command surface for requesting abort/retry or changing constraints while the job is underway, and long-running clone/validation phases remain effectively unsteerable once entered.

Storico iterazioni

5 run precedenti su questo artefatto. Ogni run_id apre la sua readiness review.

QuandoScoreStatoRun ID
12 maggio 2026 (questa run)74 / CAllineato093809b5
12 mag 202698 / AAllineato4128f700
12 mag 202698 / AAllineato270e7ca6
12 mag 202674 / CAlto rischio8364019d
12 mag 202674 / CAlto rischio14a3456f
12 mag 202674 / CAlto rischio3f3bb587
Cert downgraded to emergingTentativo 1/3

Missed production blocker: `backend/app/db/models.py` references `Base` at class-definition time without importing it, so importing `app.db.models` crashes before either cohort CLI can run.

Il downgrade e' di proposito. Il cert agent e' un secondo revisore avversario, indipendente, deliberatamente piu' severo del first-pass. Ha trovato un production_blocker che il first-pass aveva mancato. Risolvi il finding qui sotto, ri-valida per ottenere un nuovo run_id, e ricertifica per il badge.

Blocker mancati dal first-pass

P8

Richiede modificheBlocker di produzione95/100

Model module crashes at import because `Base` is undefined

In `backend/app/db/models.py`, classes are declared as `class User(Base):`, `class CohortApplication(Base):`, `class CohortValidationJob(Base):`, and `class UserValidationRun(Base):`, but the file only imports `datetime`, SQLAlchemy column types, and `Mapped/mapped_column`; it never imports `Base` from `app.db.session`. Both `run_cohort_validate.py` and `approve_cohort_application.py` execute `from app.db import models`, which would evaluate these class definitions and raise `NameError: name 'Base' is not defined` at module load time.

Raccomandazione

Add an explicit `from app.db.session import Base` import in `backend/app/db/models.py` and include an import smoke test for `app.db.models`, `app.scripts.run_cohort_validate`, and `app.scripts.approve_cohort_application`.

Findings per principio

10 principi valutati. Verdict, severity, evidenza e raccomandazione per ognuno.

P0

Richiede modificheHardening consigliato35/100

Optimise for steering, not only initiating

The code models steering intent with `CohortValidationJob.abort_requested` and checks it in `mark_step_started()` plus once before `validating`, but there is no reviewed durable command surface such as `request_abort(job_id)` or `retry_failed_job(job_id)` to set that flag or initiate recovery. Once inside `fetch_public_repo()` or `asyncio.run(validate_code_against_principles(...))`, the workflow is effectively unsteerable except by killing the process; constraints also cannot be updated mid-run. Delta: this maintains the prior P10 `needs_changes` finding; the newly included stubs do not address the prior recommendation to move steering into a small durable service boundary.

Raccomandazione

Move steering into a small durable service boundary outside the execution loop: expose command functions such as `request_abort(job_id)`, `retry_failed_job(job_id)`, and, if needed, `update_validation_constraints(job_id, constraints)`, then have the worker poll those persisted commands before and during long-running phases where cancellation is feasible.

P0

Allineato

Design for delegation rather than direct manipulation

Delegation is represented as an assigned workflow rather than manual step execution: `approve()` gates the cohort approval, then `run_cohort_validate.run()` creates a `CohortValidationJob` and `_execute_with_job()` performs clone, language detection, surface selection, bundling, validation, and persistence. Scope is bounded by `repo_url`, `ValidationContext(repository=namespace, files=list(selection.selected_paths))`, `MAX_FILES`, `MAX_BUNDLE_BYTES`, and selected file paths.

P0

Allineato

Ensure that background work remains perceptible

Background progress is durable and perceptible through `CohortValidationJob.status`, `queued_at`, `cloning_started_at`, `selecting_started_at`, `bundling_started_at`, `validating_started_at`, `terminal_at`, `failure_kind`, `safe_display_message`, and `retry_eligible`. `_mirror_terminal_to_app()` also mirrors terminal job outcomes into `CohortApplication.onboarding_state`, including the `finally` path in `_execute_with_job()`, so users/operators can leave and return without losing continuity.

P0

Allineato

Align feedback with the user’s level of attention

Feedback is calibrated by separating concise user/operator states from diagnostic detail: routine progress uses `validation_queued`, `validation_complete`, and `validation_failed`; intervention cases use `safe_display_message`, `onboarding_failure_reason`, and typed failure states such as `firebase_user_failed`, `sign_in_link_failed`, `approval_email_failed`, and `validate_error`. Detailed validator metadata is summarized by `_summarize_validate_log()` instead of exposing raw logs as primary feedback.

P0

Allineato

Apply progressive disclosure to system agency

The workflow exposes minimal primary state via `onboarding_state` and job `status`, while preserving deeper inspection detail in the persisted `audit_object`. The audit separates `source`, `selection`, `validate`, and `job` details, including selected file hashes, skipped reads, bundle hash, latency, and log-signal summaries, which supports progressive disclosure rather than dumping internal complexity into the main state.

P0

Allineato

Replace implied magic with clear mental models

The code gives a clear mental model for what the system can and cannot do: `fetch_public_repo()` accepts only public HTTPS GitHub URLs via `_GITHUB_URL_RE`; `_detect_language()` only supports Python and TypeScript; `SelectionRejected`, `FAILURE_KINDS`, and messages like `No Py/TS files.` and `URL not public HTTPS GitHub.` make limits explicit. The LLM input is wrapped with `BOUNDARY_HEADER`, `ENVELOPE_SCHEMA`, and `ENVELOPE_ADVISORY`, making the untrusted-code boundary explicit.

P0

Allineato

Expose meaningful operational state, not internal complexity

Operational state is expressed in actionable categories rather than raw implementation details: job states are `queued`, step names, and terminal states such as `completed`, `blocked`, `failed`, and `aborted`; application-level state is mapped to `validation_queued`, `validation_complete`, or `validation_failed`. Low-level details such as `commit_sha`, `bundle_sha256`, `selected_count`, and validator log summaries are retained for diagnostics without becoming the primary user-facing state.

P0

Allineato

Establish trust through inspectability

Inspectability is supported by concrete audit primitives: `build_file_envelope()` creates a typed envelope with `envelope_schema`, `boundary_contract`, per-file `sha256`, and an `envelope_hash`; `audit_object` records `repo_url`, `commit_sha`, selected file hashes, skipped reads, bundle byte counts, `bundle_sha256`, validation latency, boundary contract version, and envelope hash. These identifiers allow a reviewer to trace what source material was sent to validation and how the result was produced.

P0

Allineato

Make hand-offs, approvals, and blockers explicit

Approvals, hand-offs, and blockers are explicit. `approve()` requires an interactive confirmation unless `--yes` is supplied, catches Firebase/sign-in/email failures into typed `onboarding_state` values, and checks `OPENAI_API_KEY` before setting `validation_queued`. `_execute_with_job()` maps invalid repos, clone failures, selector rejection, read failures, validation failures, and persistence failures into `mark_blocked()` or `mark_failed()` with `failure_kind`, `safe_display_message`, and `retry_eligible`; the `finally` block mirrors terminal status back to the application.

P0

Allineato

Represent delegated work as a system, not merely as a conversation

Delegated work is modeled as a structured system, not a chat transcript. `CohortValidationJob` stores lifecycle, ownership, retry, abort, commit, bundle, and selection metadata; `UserValidationRun` stores the validator result separately; `_execute_with_job()` advances through explicit phases (`cloning`, `selecting`, `bundling`, `validating`) and persists terminal outcomes. The workflow therefore has a governable state machine and audit trail.

Aggiungi al tuo README

Due varianti embeddabili: una piccola e una a card più ricca.

Score card (consigliata)

Blueprint Readiness Score card
[![Blueprint Readiness Score card](https://aidesignblueprint.com/api/badge/run/093809b5-b30a-4b23-b02c-30a308ee7dea/card.svg)](https://aidesignblueprint.com/en/readiness-review/093809b5-b30a-4b23-b02c-30a308ee7dea)

Badge piatto

Blueprint Readiness Score badge
[![Blueprint Readiness Score](https://aidesignblueprint.com/api/badge/run/093809b5-b30a-4b23-b02c-30a308ee7dea.svg)](https://aidesignblueprint.com/en/readiness-review/093809b5-b30a-4b23-b02c-30a308ee7dea)
Dettagli baseline e iterazione
Baseline: usedDoctrine: same doctrineRace: checked clear
Rubric: 2026-05-04

Run ID: 093809b5-b30a-4b23-b02c-30a308ee7dea · Results expire after 90 days

Run by agents. Governed by humans. Validated by the AI Design Blueprint.