Vai al contenuto principaleVai al footer
Revisionato

Valutazione completata; in attesa di revisione delle prove.

Agent Architecture Review, Snapshot di validazione

Valutato il 18 maggio 2026 rispetto alla doctrine di AI Design Blueprint

Emergente

Stato: Alto rischio

74/100

Voto C

6 allineati1 blocker produzione3 hardening
Verdetti per principio

The submission is a real autonomous workflow with several strong architectural primitives: durable CohortValidationJob/UserValidationRun rows, explicit phase transitions, typed blocker/failure states, approval gates, retry/abort endpoints, and a reproducibility-oriented audit envelope. The main production gap is that long-running validation is dispatched via in-process FastAPI BackgroundTasks without a durable worker lease/heartbeat/stale-job transition, so Cloud Run process loss can leave users seeing queued/validating work indefinitely. Secondary hardening issues are stale/conflicting operator docs and stdout handling around background CLI execution.

Findings per principio

10 principi valutati. Verdict, severity, evidenza e raccomandazione per ognuno.

P0

Richiede modificheBlocker di produzione60/100

Ensure that background work remains perceptible

The code has good perceptibility primitives once execution is alive: `create_job(... status='queued')`, `mark_step_started(...)`, `_create_user_run_at_queued(... phase='queued')`, `_set_user_run_phase(...)`, and admin detail fields expose step timestamps and terminal status. However `post_approve()` dispatches the long validator through in-process `BackgroundTasks`, and `_run_validate_background()` explicitly notes that Cloud Run scale-down can kill the task; there is no lease, heartbeat, timeout, or stale-job reaper in the submitted code to turn a stranded `validation_queued`/`validating` state into an explicit blocked/failed state.

Raccomandazione

Move validation execution to a durable worker/queue boundary with a persisted lease/heartbeat and stale-job transition. The admin route should enqueue a job transactionally and return the persisted job id; a separate worker should own execution and mark timed-out or lost leases as retryable failures.

P0

Richiede modificheHardening consigliato40/100

Optimise for steering, not only initiating

The workflow has meaningful steering primitives: `post_abort_job()` delegates to `steer_cli.request_abort`, retry endpoints exist for validation and onboarding, and `_execute_with_job()` checks `job.abort_requested` before the LLM call and again after validation before persisting the result. The remaining gap is that the long `asyncio.run(validate_code_against_principles(request, api_key))` call has no visible timeout/cancellable worker boundary in this code, so an abort requested during a hung or very slow provider call cannot be honored until the call returns.

Raccomandazione

Put the validator call behind the same durable worker lease recommended for P2, with an explicit timeout and cancellation path that marks the job aborted/failed without waiting indefinitely on the provider call. Keep the existing pre/post-persist abort checks as the final safety gate.

P0

Richiede modificheHardening consigliato35/100

Replace implied magic with clear mental models

The runtime mental model is mostly explicit through Mode A/B branching, `onboarding_state` values, `failure_kind`, `safe_display_message`, and OPENAI_API_KEY preflight handling. But operator-facing comments/docstrings are materially stale: the module docstring in `approve_cohort_application.py` says validate is not triggered by the CLI and that the founder runs `run_cohort_validate` separately, while the actual `main()` calls `approve()`, and `approve()` runs `run_cohort_validate(application_id)` synchronously after `approve_account()`. `_run_validate_background()` also says stdout is swapped for structured logger, but the implementation does not do that.

Raccomandazione

Update the founder/operator-facing docs and CLI help to match the current contract: account provisioning vs end-to-end approval vs background validation should be unambiguous. Remove or rewrite stale v1 comments rather than adding another wrapper layer.

P0

Richiede modificheHardening consigliato30/100

Align feedback with the user’s level of attention

Foreground and background attention modes are mostly separated: `_run_cli()` captures stdout/stderr for immediate admin actions, while job state and safe messages are persisted for later polling. The hardening gap is that `_run_validate_background()` claims it swaps stdout for structured logging, but it simply calls `run_cohort_validate(application_id)`, whose implementation uses many `print(...)` calls; because `_run_cli()` redirects process-global stdout and `_CLI_INVOKE_LOCK` does not cover background validation, output from a background scan can be interleaved with or captured by a concurrent admin action response.

Raccomandazione

Route background validator output to structured logs or a persisted job-event table, and keep process-global stdout redirection out of concurrently running background work. Prefer deleting stdout capture from long-running paths in favor of the existing persisted job state.

P0

Allineato

Design for delegation rather than direct manipulation

The workflow delegates the post-approval validation task instead of making the founder manually execute every scan step: `post_approve()` runs `approve_cli.approve_account` and dispatches `_run_validate_background`, while `run_cohort_validate.run()` orchestrates clone → language detect → select → bundle → validate → persist. Scope is bounded by `application_id`, `repo_url`, `user_id`, `ValidationContext(repository=namespace, files=...)`, and explicit approve/reject/retry/abort actions.

P0

Allineato

Apply progressive disclosure to system agency

The API applies progressive disclosure: `list_applications()` returns compact `CohortApplicationSummaryOut` rows with latest status/id, while `get_application()` returns `CohortApplicationDetailOut` with full application fields and `jobs`. The job model separates summary status (`status`, `failure_kind`, `safe_display_message`, `retry_eligible`) from deeper diagnostic fields such as timestamps, `commit_sha`, `bundle_sha256`, and counts.

P0

Allineato

Expose meaningful operational state, not internal complexity

Operational state is represented in user/actionable terms rather than only internals: `onboarding_state` distinguishes `validation_queued`, `validation_complete`, and `validation_failed`; `CohortValidationJobOut` exposes `status`, `failure_kind`, `safe_display_message`, `retry_eligible`, `abort_requested`, and phase timestamps; `_JOB_FAILURE_TO_RUN_CODE` maps job failures to user-run failure codes. Deeper technical details such as hashes and commit SHA are reserved for admin/job detail and audit context.

P0

Allineato

Establish trust through inspectability

Inspectability is supported by concrete reproducibility primitives. `_build_implementation_context()` constructs a typed file envelope with per-file `path`, `byte_size`, `sha256`, and `content`; `wrap_bundle_with_boundary()` applies an explicit untrusted-code boundary; the persisted `audit_object` records `commit_sha`, selected/skipped files, bundle SHA-256, `BOUNDARY_CONTRACT_VERSION`, `ENVELOPE_SCHEMA`, `envelope_hash`, validator latency, and log-shape signals. This gives reviewers traceability from result back to selected source material without dumping opaque logs.

P0

Allineato

Make hand-offs, approvals, and blockers explicit

Handled hand-offs and blockers are explicit. CLI approval prompts unless `yes=True`; admin approval is behind `require_admin`; onboarding failures persist typed states such as `firebase_user_failed`, `sign_in_link_failed`, `approval_email_failed`, and `validation_failed`; validation blockers call `mark_blocked()` with specific `failure_kind` values like `invalid_repo_url`, `selector_rejected`, `no_supported_language`, and `read_failed`; unexpected paths call `mark_failed()` with `safe_display_message` and retry eligibility. The admin API exposes abort, retry validation, and retry onboarding endpoints.

P0

Allineato

Represent delegated work as a system, not merely as a conversation

Delegated work is modeled as a structured system, not a conversation. `CohortValidationJob` is the execution ledger, `UserValidationRun` is pre-created for user-facing continuity, `CohortApplication.validation_run_id` links the application to the result, and admin endpoints expose list/detail/action views over those structures. The orchestration function persists phase transitions and audit metadata instead of relying on a message transcript.

Aggiungi al tuo README

Due varianti embeddabili: una piccola e una a card più ricca.

Score card (consigliata)

Blueprint Readiness Score card
[![Blueprint Readiness Score card](https://aidesignblueprint.com/api/badge/run/36163b90-74f5-47d6-b735-bdfb25be596e/card.svg)](https://aidesignblueprint.com/en/readiness-review/36163b90-74f5-47d6-b735-bdfb25be596e)

Badge piatto

Blueprint Readiness Score badge
[![Blueprint Readiness Score](https://aidesignblueprint.com/api/badge/run/36163b90-74f5-47d6-b735-bdfb25be596e.svg)](https://aidesignblueprint.com/en/readiness-review/36163b90-74f5-47d6-b735-bdfb25be596e)
Dettagli baseline e iterazione
Rubric: 2026-05-04Voto limitato da 0 finding ad alto rischio

Run ID: 36163b90-74f5-47d6-b735-bdfb25be596e · Results expire after 90 days

Run by agents. Governed by humans. Validated by the AI Design Blueprint.

Readiness review · 74/C · AI Design Blueprint