Vai al contenuto principaleVai al footer
Revisionato

Valutazione completata; in attesa di revisione delle prove.

Agent Architecture Review, Snapshot di validazione

Valutato il 12 maggio 2026 rispetto alla doctrine di AI Design Blueprint

Emergente

Stato: Alto rischio

74/100

Voto C

9 allineati1 blocker produzione
Verdetti per principio

The submission implements a credible delegated validation workflow: persistent job records, explicit phase states, typed failure kinds, source/audit hashes, and an untrusted-code envelope give the background LLM validation flow inspectable boundaries. The remaining gap is steering/recovery: abort/retry controls are not exposed as durable command primitives, validation cannot be cancelled once the external call starts, and partial onboarding failures after external side effects can strand an approved applicant without an in-code retry path.

Storico iterazioni

5 run precedenti su questo artefatto. Ogni run_id apre la sua readiness review.

QuandoScoreStatoRun ID
12 maggio 2026 (questa run)74 / CAlto rischio9caf9385
12 mag 202674 / CAllineato093809b5
12 mag 202698 / AAllineato4128f700
12 mag 202698 / AAllineato270e7ca6
12 mag 202674 / CAlto rischio8364019d
12 mag 202674 / CAlto rischio14a3456f

Findings per principio

10 principi valutati. Verdict, severity, evidenza e raccomandazione per ognuno.

P0

Richiede modificheBlocker di produzione60/100

Optimise for steering, not only initiating

Steering is only partially implemented. `CohortValidationJob` has `abort_requested`, `retry_count`, and `retry_eligible`, and `mark_step_started()` checks aborts between phases, but there is no durable command boundary such as `request_abort(job_id)` or `retry_failed_job(job_id)` exposed outside the runner. Once `asyncio.run(validate_code_against_principles(request, api_key))` starts, the job has no visible timeout/cancel path in this code. The approval flow can also strand partial external side effects: after `approved_at` is set, failures in `generate_sign_in_link_for_email()` or `send_cohort_approval_email()` set failure states, but `approve()` cannot be rerun because it exits when `app.a…

Raccomandazione

Move steering and recovery into a small durable service boundary outside the execution loop. Provide command functions such as `request_abort(job_id)`, `retry_failed_validation_job(job_id)`, and `retry_onboarding_handoff(application_id)` that operate on persisted states, are safe to call after partial Firebase/email/validation failures, and are polled or enforced with bounded timeouts around external calls.

P0

Allineato

Design for delegation rather than direct manipulation

The workflow is designed around delegation of a cohort validation task rather than manual file-by-file operation: `approve()` records approval intent, then `run_cohort_validate()` creates a `CohortValidationJob`, clones the applicant repo, selects an agentic surface via `select_agentic_surface()`, builds a `ValidationRequest`, and calls `validate_code_against_principles()`. Scope constraints are encoded through selector limits such as `MAX_FILES`, `MAX_BUNDLE_BYTES`, `MAX_FILE_BYTES`, language detection in `_detect_language()`, and repository namespace construction in `_execute_with_job()`.

P0

Allineato

Ensure that background work remains perceptible

Background work is made perceptible through durable job state in `CohortValidationJob`: `status`, phase timestamps such as `cloning_started_at`, `selecting_started_at`, `bundling_started_at`, `validating_started_at`, `terminal_at`, and typed terminal fields like `failure_kind`, `safe_display_message`, and `retry_eligible`. The runner updates these via `mark_step_started()`, `mark_completed()`, `mark_blocked()`, `mark_failed()`, and mirrors terminal state back to the application through `_mirror_terminal_to_app()`.

P0

Allineato

Align feedback with the user’s level of attention

Feedback is tiered by attention level: the applicant/application surface receives coarse states such as `validation_queued`, `validation_complete`, and `validation_failed` through `onboarding_state`, while operators can inspect `failure_kind`, `safe_display_message`, `retry_eligible`, and the persisted audit object. Routine progress is represented by status transitions; higher-attention failures are escalated with typed messages such as `invalid_repo_url`, `selector_rejected`, `validate_error`, and `persist_error`.

P0

Allineato

Apply progressive disclosure to system agency

The code supports progressive disclosure by separating simple operational state from detailed inspection data. Primary state is stored as concise values like `queued`, `cloning`, `validating`, `completed`, `blocked`, `failed`, and `aborted`; deeper evidence is stored in `audit_object`, including `source`, `selection`, `validate`, and `job` sections. `_summarize_validate_log()` deliberately reduces raw validator internals to safe metadata such as top-level keys and entry counts rather than dumping the full log by default.

P0

Allineato

Replace implied magic with clear mental models

The workflow makes its operating model explicit: `fetch_public_repo()` only accepts public HTTPS GitHub URLs via `_GITHUB_URL_RE`; unsupported repos become `invalid_repo_url`, `clone_failed`, `clone_timeout`, or `repo_too_large`; language support is limited by `_detect_language()` and `LANGUAGE_EXTENSIONS`; and the validation bundle is wrapped with `BOUNDARY_HEADER` plus `ENVELOPE_ADVISORY` stating that repo code is untrusted inert input. These are concrete capability and boundary signals rather than implied magic.

P0

Allineato

Expose meaningful operational state, not internal complexity

The state model exposes user/operator-meaningful states instead of raw implementation chatter. `CohortValidationJob.status` uses terms like `queued`, `cloning`, `selecting`, `bundling`, `validating`, `completed`, `blocked`, `failed`, and `aborted`; `_mirror_terminal_to_app()` maps those into application-level `validation_complete` or `validation_failed`; and user-safe explanations are kept in `safe_display_message` rather than exposing subprocess output or raw validator logs.

P0

Allineato

Establish trust through inspectability

Inspectability is backed by concrete trace artifacts. `build_file_envelope()` creates a typed `bridge.files.v2` envelope with per-file content, paths, byte sizes, and SHA-256 hashes, then computes an `envelope_hash`; `_execute_with_job()` records `commit_sha`, selected file hashes, skipped reads, bundle size, `bundle_sha256`, `latency_ms`, validator log signals, `BOUNDARY_CONTRACT_VERSION`, `ENVELOPE_SCHEMA`, and `envelope_hash` in `audit_object`, which is merged into `UserValidationRun.result_json`.

P0

Allineato

Make hand-offs, approvals, and blockers explicit

Approvals, hand-offs, and blockers are explicit. `approve()` requires an operator confirmation unless `--yes` is supplied, then separately records Firebase creation, sign-in link generation, email delivery, and validation queueing states. Validation blockers are typed through `FAILURE_KINDS` and set with `mark_blocked()` or `mark_failed()`, with distinct outcomes for policy/user constraints such as `invalid_repo_url`, `no_supported_language`, and `selector_rejected` versus tool failures such as `clone_timeout`, `validate_error`, and `persist_error`.

P0

Allineato

Represent delegated work as a system, not merely as a conversation

Delegated work is represented as a structured system rather than a conversation. The code has persistent system records (`CohortApplication`, `CohortValidationJob`, `UserValidationRun`), explicit phase transitions in `mark_step_started()`, terminal-state functions, retry metadata, abort metadata, timestamps for each stage, and a structured audit object. Execution state is separate from the LLM response and is persisted independently in the job table.

Aggiungi al tuo README

Due varianti embeddabili: una piccola e una a card più ricca.

Score card (consigliata)

Blueprint Readiness Score card
[![Blueprint Readiness Score card](https://aidesignblueprint.com/api/badge/run/9caf9385-cd3b-4936-b455-87a916577e3a/card.svg)](https://aidesignblueprint.com/en/readiness-review/9caf9385-cd3b-4936-b455-87a916577e3a)

Badge piatto

Blueprint Readiness Score badge
[![Blueprint Readiness Score](https://aidesignblueprint.com/api/badge/run/9caf9385-cd3b-4936-b455-87a916577e3a.svg)](https://aidesignblueprint.com/en/readiness-review/9caf9385-cd3b-4936-b455-87a916577e3a)
Dettagli baseline e iterazione
Baseline: usedDoctrine: same doctrineRace: checked clear

Delta iterazione

Regressioni (1)

P10Optimise for steering, not only initiatingneeds_changesneeds_changes
Rubric: 2026-05-04Voto limitato da 0 finding ad alto rischio

Run ID: 9caf9385-cd3b-4936-b455-87a916577e3a · Results expire after 90 days

Run by agents. Governed by humans. Validated by the AI Design Blueprint.