Valutazione completata; in attesa di revisione delle prove.
Valutato il 18 maggio 2026 rispetto alla doctrine di AI Design Blueprint
Emergente
Stato: Alto rischio
74/100
Voto C
The submission is a real autonomous workflow with several strong architectural primitives: durable CohortValidationJob/UserValidationRun rows, explicit phase transitions, typed blocker/failure states, approval gates, retry/abort endpoints, and a reproducibility-oriented audit envelope. The main production gap is that long-running validation is dispatched via in-process FastAPI BackgroundTasks without a durable worker lease/heartbeat/stale-job transition, so Cloud Run process loss can leave users seeing queued/validating work indefinitely. Secondary hardening issues are stale/conflicting operator docs and stdout handling around background CLI execution.
Findings per principio
10 principi valutati. Verdict, severity, evidenza e raccomandazione per ognuno.
P0
Richiede modificheBlocker di produzione60/100Ensure that background work remains perceptible
The code has good perceptibility primitives once execution is alive: `create_job(... status='queued')`, `mark_step_started(...)`, `_create_user_run_at_queued(... phase='queued')`, `_set_user_run_phase(...)`, and admin detail fields expose step timestamps and terminal status. However `post_approve()` dispatches the long validator through in-process `BackgroundTasks`, and `_run_validate_background()` explicitly notes that Cloud Run scale-down can kill the task; there is no lease, heartbeat, timeout, or stale-job reaper in the submitted code to turn a stranded `validation_queued`/`validating` state into an explicit blocked/failed state.
Raccomandazione
Move validation execution to a durable worker/queue boundary with a persisted lease/heartbeat and stale-job transition. The admin route should enqueue a job transactionally and return the persisted job id; a separate worker should own execution and mark timed-out or lost leases as retryable failures.
P0
Richiede modificheHardening consigliato40/100Optimise for steering, not only initiating
The workflow has meaningful steering primitives: `post_abort_job()` delegates to `steer_cli.request_abort`, retry endpoints exist for validation and onboarding, and `_execute_with_job()` checks `job.abort_requested` before the LLM call and again after validation before persisting the result. The remaining gap is that the long `asyncio.run(validate_code_against_principles(request, api_key))` call has no visible timeout/cancellable worker boundary in this code, so an abort requested during a hung or very slow provider call cannot be honored until the call returns.
Raccomandazione
Put the validator call behind the same durable worker lease recommended for P2, with an explicit timeout and cancellation path that marks the job aborted/failed without waiting indefinitely on the provider call. Keep the existing pre/post-persist abort checks as the final safety gate.
P0
Richiede modificheHardening consigliato35/100Replace implied magic with clear mental models
The runtime mental model is mostly explicit through Mode A/B branching, `onboarding_state` values, `failure_kind`, `safe_display_message`, and OPENAI_API_KEY preflight handling. But operator-facing comments/docstrings are materially stale: the module docstring in `approve_cohort_application.py` says validate is not triggered by the CLI and that the founder runs `run_cohort_validate` separately, while the actual `main()` calls `approve()`, and `approve()` runs `run_cohort_validate(application_id)` synchronously after `approve_account()`. `_run_validate_background()` also says stdout is swapped for structured logger, but the implementation does not do that.
Raccomandazione
Update the founder/operator-facing docs and CLI help to match the current contract: account provisioning vs end-to-end approval vs background validation should be unambiguous. Remove or rewrite stale v1 comments rather than adding another wrapper layer.
P0
Richiede modificheHardening consigliato30/100Align feedback with the user’s level of attention
Foreground and background attention modes are mostly separated: `_run_cli()` captures stdout/stderr for immediate admin actions, while job state and safe messages are persisted for later polling. The hardening gap is that `_run_validate_background()` claims it swaps stdout for structured logging, but it simply calls `run_cohort_validate(application_id)`, whose implementation uses many `print(...)` calls; because `_run_cli()` redirects process-global stdout and `_CLI_INVOKE_LOCK` does not cover background validation, output from a background scan can be interleaved with or captured by a concurrent admin action response.
Raccomandazione
Route background validator output to structured logs or a persisted job-event table, and keep process-global stdout redirection out of concurrently running background work. Prefer deleting stdout capture from long-running paths in favor of the existing persisted job state.
P0
AllineatoDesign for delegation rather than direct manipulation
The workflow delegates the post-approval validation task instead of making the founder manually execute every scan step: `post_approve()` runs `approve_cli.approve_account` and dispatches `_run_validate_background`, while `run_cohort_validate.run()` orchestrates clone → language detect → select → bundle → validate → persist. Scope is bounded by `application_id`, `repo_url`, `user_id`, `ValidationContext(repository=namespace, files=...)`, and explicit approve/reject/retry/abort actions.
P0
AllineatoApply progressive disclosure to system agency
The API applies progressive disclosure: `list_applications()` returns compact `CohortApplicationSummaryOut` rows with latest status/id, while `get_application()` returns `CohortApplicationDetailOut` with full application fields and `jobs`. The job model separates summary status (`status`, `failure_kind`, `safe_display_message`, `retry_eligible`) from deeper diagnostic fields such as timestamps, `commit_sha`, `bundle_sha256`, and counts.
P0
AllineatoExpose meaningful operational state, not internal complexity
Operational state is represented in user/actionable terms rather than only internals: `onboarding_state` distinguishes `validation_queued`, `validation_complete`, and `validation_failed`; `CohortValidationJobOut` exposes `status`, `failure_kind`, `safe_display_message`, `retry_eligible`, `abort_requested`, and phase timestamps; `_JOB_FAILURE_TO_RUN_CODE` maps job failures to user-run failure codes. Deeper technical details such as hashes and commit SHA are reserved for admin/job detail and audit context.
P0
AllineatoEstablish trust through inspectability
Inspectability is supported by concrete reproducibility primitives. `_build_implementation_context()` constructs a typed file envelope with per-file `path`, `byte_size`, `sha256`, and `content`; `wrap_bundle_with_boundary()` applies an explicit untrusted-code boundary; the persisted `audit_object` records `commit_sha`, selected/skipped files, bundle SHA-256, `BOUNDARY_CONTRACT_VERSION`, `ENVELOPE_SCHEMA`, `envelope_hash`, validator latency, and log-shape signals. This gives reviewers traceability from result back to selected source material without dumping opaque logs.
P0
AllineatoMake hand-offs, approvals, and blockers explicit
Handled hand-offs and blockers are explicit. CLI approval prompts unless `yes=True`; admin approval is behind `require_admin`; onboarding failures persist typed states such as `firebase_user_failed`, `sign_in_link_failed`, `approval_email_failed`, and `validation_failed`; validation blockers call `mark_blocked()` with specific `failure_kind` values like `invalid_repo_url`, `selector_rejected`, `no_supported_language`, and `read_failed`; unexpected paths call `mark_failed()` with `safe_display_message` and retry eligibility. The admin API exposes abort, retry validation, and retry onboarding endpoints.
P0
AllineatoRepresent delegated work as a system, not merely as a conversation
Delegated work is modeled as a structured system, not a conversation. `CohortValidationJob` is the execution ledger, `UserValidationRun` is pre-created for user-facing continuity, `CohortApplication.validation_run_id` links the application to the result, and admin endpoints expose list/detail/action views over those structures. The orchestration function persists phase transitions and audit metadata instead of relying on a message transcript.
Aggiungi al tuo README
Due varianti embeddabili: una piccola e una a card più ricca.
Score card (consigliata)
[](https://aidesignblueprint.com/en/readiness-review/36163b90-74f5-47d6-b735-bdfb25be596e)
Badge piatto
[](https://aidesignblueprint.com/en/readiness-review/36163b90-74f5-47d6-b735-bdfb25be596e)
Run ID: 36163b90-74f5-47d6-b735-bdfb25be596e · Results expire after 90 days
Run by agents. Governed by humans. Validated by the AI Design Blueprint.