Revisionato

Valutazione completata; in attesa di revisione delle prove.

Agent Architecture Review, Snapshot di validazione

Valutato il 12 maggio 2026 rispetto alla doctrine di AI Design Blueprint

Emergente

Stato: Alto rischio

74/100

Voto C

9 allineati1 blocker produzione

Verdetti per principio

The submission implements a credible delegated validation workflow: persistent job records, explicit phase states, typed failure kinds, source/audit hashes, and an untrusted-code envelope give the background LLM validation flow inspectable boundaries. The remaining gap is steering/recovery: abort/retry controls are not exposed as durable command primitives, validation cannot be cancelled once the external call starts, and partial onboarding failures after external side effects can strand an approved applicant without an in-code retry path.

Storico iterazioni

5 run precedenti su questo artefatto. Ogni run_id apre la sua readiness review.

Quando	Score	Stato	Run ID
12 maggio 2026 (questa run)	74 / C	Alto rischio	9caf9385…
12 mag 2026	74 / C	Allineato	093809b5…
12 mag 2026	98 / A	Allineato	4128f700…
12 mag 2026	98 / A	Allineato	270e7ca6…
12 mag 2026	74 / C	Alto rischio	8364019d…
12 mag 2026	74 / C	Alto rischio	14a3456f…

Findings per principio

10 principi valutati. Verdict, severity, evidenza e raccomandazione per ognuno.

Richiede modificheBlocker di produzione60/100

Optimise for steering, not only initiating

Steering is only partially implemented. `CohortValidationJob` has `abort_requested`, `retry_count`, and `retry_eligible`, and `mark_step_started()` checks aborts between phases, but there is no durable command boundary such as `request_abort(job_id)` or `retry_failed_job(job_id)` exposed outside the runner. Once `asyncio.run(validate_code_against_principles(request, api_key))` starts, the job has no visible timeout/cancel path in this code. The approval flow can also strand partial external side effects: after `approved_at` is set, failures in `generate_sign_in_link_for_email()` or `send_cohort_approval_email()` set failure states, but `approve()` cannot be rerun because it exits when `app.a…

Raccomandazione

Move steering and recovery into a small durable service boundary outside the execution loop. Provide command functions such as `request_abort(job_id)`, `retry_failed_validation_job(job_id)`, and `retry_onboarding_handoff(application_id)` that operate on persisted states, are safe to call after partial Firebase/email/validation failures, and are polled or enforced with bounded timeouts around external calls.

Allineato

Design for delegation rather than direct manipulation

The workflow is designed around delegation of a cohort validation task rather than manual file-by-file operation: `approve()` records approval intent, then `run_cohort_validate()` creates a `CohortValidationJob`, clones the applicant repo, selects an agentic surface via `select_agentic_surface()`, builds a `ValidationRequest`, and calls `validate_code_against_principles()`. Scope constraints are encoded through selector limits such as `MAX_FILES`, `MAX_BUNDLE_BYTES`, `MAX_FILE_BYTES`, language detection in `_detect_language()`, and repository namespace construction in `_execute_with_job()`.

Allineato

Ensure that background work remains perceptible

Background work is made perceptible through durable job state in `CohortValidationJob`: `status`, phase timestamps such as `cloning_started_at`, `selecting_started_at`, `bundling_started_at`, `validating_started_at`, `terminal_at`, and typed terminal fields like `failure_kind`, `safe_display_message`, and `retry_eligible`. The runner updates these via `mark_step_started()`, `mark_completed()`, `mark_blocked()`, `mark_failed()`, and mirrors terminal state back to the application through `_mirror_terminal_to_app()`.

Allineato

Align feedback with the user’s level of attention

Feedback is tiered by attention level: the applicant/application surface receives coarse states such as `validation_queued`, `validation_complete`, and `validation_failed` through `onboarding_state`, while operators can inspect `failure_kind`, `safe_display_message`, `retry_eligible`, and the persisted audit object. Routine progress is represented by status transitions; higher-attention failures are escalated with typed messages such as `invalid_repo_url`, `selector_rejected`, `validate_error`, and `persist_error`.

Allineato

Apply progressive disclosure to system agency

The code supports progressive disclosure by separating simple operational state from detailed inspection data. Primary state is stored as concise values like `queued`, `cloning`, `validating`, `completed`, `blocked`, `failed`, and `aborted`; deeper evidence is stored in `audit_object`, including `source`, `selection`, `validate`, and `job` sections. `_summarize_validate_log()` deliberately reduces raw validator internals to safe metadata such as top-level keys and entry counts rather than dumping the full log by default.

Allineato

Replace implied magic with clear mental models

The workflow makes its operating model explicit: `fetch_public_repo()` only accepts public HTTPS GitHub URLs via `_GITHUB_URL_RE`; unsupported repos become `invalid_repo_url`, `clone_failed`, `clone_timeout`, or `repo_too_large`; language support is limited by `_detect_language()` and `LANGUAGE_EXTENSIONS`; and the validation bundle is wrapped with `BOUNDARY_HEADER` plus `ENVELOPE_ADVISORY` stating that repo code is untrusted inert input. These are concrete capability and boundary signals rather than implied magic.

Allineato

Expose meaningful operational state, not internal complexity

The state model exposes user/operator-meaningful states instead of raw implementation chatter. `CohortValidationJob.status` uses terms like `queued`, `cloning`, `selecting`, `bundling`, `validating`, `completed`, `blocked`, `failed`, and `aborted`; `_mirror_terminal_to_app()` maps those into application-level `validation_complete` or `validation_failed`; and user-safe explanations are kept in `safe_display_message` rather than exposing subprocess output or raw validator logs.

Allineato

Establish trust through inspectability

Inspectability is backed by concrete trace artifacts. `build_file_envelope()` creates a typed `bridge.files.v2` envelope with per-file content, paths, byte sizes, and SHA-256 hashes, then computes an `envelope_hash`; `_execute_with_job()` records `commit_sha`, selected file hashes, skipped reads, bundle size, `bundle_sha256`, `latency_ms`, validator log signals, `BOUNDARY_CONTRACT_VERSION`, `ENVELOPE_SCHEMA`, and `envelope_hash` in `audit_object`, which is merged into `UserValidationRun.result_json`.

Allineato

Make hand-offs, approvals, and blockers explicit

Approvals, hand-offs, and blockers are explicit. `approve()` requires an operator confirmation unless `--yes` is supplied, then separately records Firebase creation, sign-in link generation, email delivery, and validation queueing states. Validation blockers are typed through `FAILURE_KINDS` and set with `mark_blocked()` or `mark_failed()`, with distinct outcomes for policy/user constraints such as `invalid_repo_url`, `no_supported_language`, and `selector_rejected` versus tool failures such as `clone_timeout`, `validate_error`, and `persist_error`.

Allineato

Represent delegated work as a system, not merely as a conversation

Delegated work is represented as a structured system rather than a conversation. The code has persistent system records (`CohortApplication`, `CohortValidationJob`, `UserValidationRun`), explicit phase transitions in `mark_step_started()`, terminal-state functions, retry metadata, abort metadata, timestamps for each stage, and a structured audit object. Execution state is separate from the LLM response and is persisted independently in the job table.

Aggiungi al tuo README

Due varianti embeddabili: una piccola e una a card più ricca.

Score card (consigliata)

[![Blueprint Readiness Score card](https://aidesignblueprint.com/api/badge/run/9caf9385-cd3b-4936-b455-87a916577e3a/card.svg)](https://aidesignblueprint.com/en/readiness-review/9caf9385-cd3b-4936-b455-87a916577e3a)

Badge piatto

[![Blueprint Readiness Score](https://aidesignblueprint.com/api/badge/run/9caf9385-cd3b-4936-b455-87a916577e3a.svg)](https://aidesignblueprint.com/en/readiness-review/9caf9385-cd3b-4936-b455-87a916577e3a)

Dettagli baseline e iterazione

Baseline: usedDoctrine: same doctrineRace: checked clear

Delta iterazione

Regressioni (1)

P10Optimise for steering, not only initiatingneeds_changesneeds_changes

Rubric: 2026-05-04Voto limitato da 0 finding ad alto rischio

Esegui la tua validazione AI Design Blueprint

Run ID: 9caf9385-cd3b-4936-b455-87a916577e3a · Results expire after 90 days

Run by agents. Governed by humans. Validated by the AI Design Blueprint.