Vai al contenuto principaleVai al footer
Revisionato

Valutazione completata; in attesa di revisione delle prove.

Agent Architecture Review, Snapshot di validazione

Valutato il 12 maggio 2026 rispetto alla doctrine di AI Design Blueprint

Alto rischio

Stato: Alto rischio

30/100

Voto F

3 allineati7 blocker produzione
Verdetti per principio

This is a real multi-step agentic workflow: it approves an applicant, clones an external repo, selects files, calls an LLM validator, persists results, and sends externally visible email. The submitted code adds meaningful audit metadata, but production trust boundaries still fail around durable run lifecycle, explicit failure/blocker state, steering/recovery, and the raw handoff of untrusted repo content into validation.

Findings per principio

10 principi valutati. Verdict, severity, evidenza e raccomandazione per ognuno.

P0

Richiede modificheBlocker di produzione80/100

Make hand-offs, approvals, and blockers explicit

Some blockers are explicit now, including malformed owner/repo parsing routed to `_mark_no_agentic_surface(reason="invalid_repo_url")`, `SelectionRejected`, and typed `FetchError.kind`. But important handoff failures still stall silently: `run()` only catches `FetchError`, so validation, JSON serialization, DB flush/commit, and audit construction exceptions propagate; then `approve()` catches all exceptions around `run_cohort_validate(app.id)` and silently `pass`es. After Firebase user creation and email delivery, that leaves no durable blocker reason or required action.

Raccomandazione

Replace the broad silent catch in `approve()` with a typed failure transition on the validation job/application, and wrap the validation/persistence phase in `run()` so every non-success path records `blocked` or `failed` with a safe reason, retry instructions, and operator visibility.

P0

Richiede modificheBlocker di produzione70/100

Ensure that background work remains perceptible

Long-running validation is perceptible only through transient CLI `print()` calls and final DB fields. `approve()` sends the approval email and then runs `run_cohort_validate()` inline, while `run()` creates no durable `queued`/`active` run row before clone/selection/LLM validation; `UserValidationRun` is only added after the validator succeeds, and graceful failures only overload `app.rejection_reason`. If the process crashes or the LLM call hangs after the email is sent, the applicant can be left in a silent pending state.

Raccomandazione

Create a persistent validation job/run record before starting clone or validation, with durable states such as `queued`, `cloning`, `selecting_files`, `validating`, `blocked`, `failed`, and `complete`; have the approval path enqueue or hand off to that lifecycle rather than relying on inline CLI output.

P0

Richiede modificheBlocker di produzione70/100

Establish trust through inspectability

The audit envelope is a substantial improvement: `audit_object` records `schema_version`, `captured_at`, source `repo_url`/`commit_sha`, selected file hashes, skipped reads, `bundle_sha256`, validator latency, and shape-only log signals. However, the bridge still constructs the validator payload by concatenating raw untrusted repo content in `_build_implementation_context()` via `chunks.append(f"# === FILE: {path} ===\n{content}\n")` and then sends it as `ValidationRequest(code=bundle, ...)`. This file does not establish or record an inert untrusted-input boundary, so repo comments or filenames containing instruction-like text could steer the validator unless a hidden downstream service comp…

Raccomandazione

Make the untrusted-code boundary a validation-service primitive: pass files as structured quoted/JSON content with explicit inert semantics, sanitize or encode path headers, pin and record the prompt/boundary version in the audit object, and keep the audit ledger separate from mutable execution flow where possible.

P0

Richiede modificheBlocker di produzione65/100

Expose meaningful operational state, not internal complexity

Operational state is not represented in user-relevant durable terms. The code stores final success as `app.validation_run_id`, but blocked/failure states are encoded into `app.rejection_reason` strings like `bridge_exit:selector_rejected` and `fetch_failed:{kind}: ...`. There is no first-class status model for `active`, `blocked`, `failed`, `awaiting retry`, or `complete`, and no persisted state at all while clone/validate is running.

Raccomandazione

Move validation state out of overloaded `rejection_reason` strings into a structured job/run table or service-owned status object with user-relevant states, timestamps, failure kind, safe display message, and retry eligibility.

P0

Richiede modificheBlocker di produzione65/100

Represent delegated work as a system, not merely as a conversation

The delegated work is implemented as sequential script control flow rather than a durable system of tasks. The docstring lists eight phases, but the database only receives a final `UserValidationRun` after success or a coarse `rejection_reason` on selected graceful exits. Substeps such as clone, language detection, selection, file-read skipping, LLM validation, and persistence are not represented as resumable or inspectable task states while they are happening.

Raccomandazione

Represent the bridge as a service-owned validation job with explicit step records or status transitions for clone, inventory, selection, bundle build, validation, audit persist, and completion; the CLI should start or observe that system rather than being the system of record.

P0

Richiede modificheBlocker di produzione60/100

Align feedback with the user’s level of attention

Feedback is not calibrated to attention level. The active operator sees console messages such as `✓ Cloned`, `✗ Selector rejected`, and skipped-read warnings, but absent users receive only eventual dashboard effects. More seriously, `approve()` catches any exception from `run_cohort_validate(app.id)` and executes `pass`, so an intervention-worthy validation failure after account creation/email is neither logged nor persisted for the applicant or founder.

Raccomandazione

Route validation progress and failures through the durable run lifecycle: keep concise foreground CLI output, but persist intervention-worthy states and unexpected exceptions with an operator-visible failure reason and applicant-safe status.

P0

Richiede modificheBlocker di produzione60/100

Optimise for steering, not only initiating

The workflow supports initiation and manual rerun, but not steering while underway. Once `run_cohort_validate()` starts, there is no durable pause, abort, resume, retry-from-step, or constraint-update primitive; the only practical controls are Ctrl-C, waiting, or rerunning if `validation_run_id` was never set. The approval path also commits user creation before downstream email/validation phases, without a rollback or recovery state beyond comments about manual rerun.

Raccomandazione

Introduce explicit lifecycle controls on the validation job: cancellable queued/running states, retry from failed step, safe abort semantics after external actions, and a way to update selection/validation constraints before restarting the LLM call.

P0

Allineato

Design for delegation rather than direct manipulation

The workflow supports delegation rather than manual step execution: after founder confirmation in `approve()`, the system creates/links the user, sends the approval email, invokes `run_cohort_validate(app.id)`, clones the repo, selects the agentic surface with `select_agentic_surface`, calls `validate_code_against_principles`, and persists a `UserValidationRun`. Scope is bounded by checks in `run()` for `approved_at`, `user_id`, `repo_url`, and existing `validation_run_id`.

P0

Allineato

Apply progressive disclosure to system agency

The code separates summary from detail well. Runtime output prints compact status lines, while deeper inspection is stored in the schema-versioned `audit_object`. `_summarize_validate_log()` deliberately persists shape-only metadata such as `top_level_keys`, `entry_count`, and `type` rather than dumping prompt snippets or reasoning traces into `result_json`.

P0

Allineato

Replace implied magic with clear mental models

The implementation makes key capabilities and limits explicit in code paths and user-facing messages: `_detect_language()` only supports Python/TypeScript, `SelectionRejected` explains that include patterns are sorting signals rather than gates, `fetch_public_repo()` rejects non-public HTTPS GitHub URLs, and `run()` distinguishes `no_supported_language`, `selector_rejected`, `invalid_repo_url`, and typed `FetchError.kind` values. The selector constants also make bundle limits visible via `MAX_FILES`, `MAX_BUNDLE_BYTES`, `MAX_FILE_BYTES`, and `MIN_BUNDLE_BYTES`.

Aggiungi al tuo README

Due varianti embeddabili: una piccola e una a card più ricca.

Score card (consigliata)

Blueprint Readiness Score card
[![Blueprint Readiness Score card](https://aidesignblueprint.com/api/badge/run/86b2c59d-0f04-4b30-a891-796c4276fead/card.svg)](https://aidesignblueprint.com/en/readiness-review/86b2c59d-0f04-4b30-a891-796c4276fead)

Badge piatto

Blueprint Readiness Score badge
[![Blueprint Readiness Score](https://aidesignblueprint.com/api/badge/run/86b2c59d-0f04-4b30-a891-796c4276fead.svg)](https://aidesignblueprint.com/en/readiness-review/86b2c59d-0f04-4b30-a891-796c4276fead)
Dettagli baseline e iterazione
Rubric: 2026-05-04

Run ID: 86b2c59d-0f04-4b30-a891-796c4276fead · Results expire after 90 days

Run by agents. Governed by humans. Validated by the AI Design Blueprint.