Vai al contenuto principaleVai al footer
Revisionato

Valutazione completata; in attesa di revisione delle prove.

Agent Architecture Review, Snapshot di validazione

Valutato il 12 maggio 2026 rispetto alla doctrine di AI Design Blueprint

Emergente

Stato: Alto rischio

74/100

Voto C

7 allineati2 blocker produzione1 hardening
Verdetti per principio

The submission is clearly an autonomous cohort-validation workflow and has strong primitives: a durable CohortValidationJob lifecycle, explicit terminal states, typed failure kinds, source/audit metadata, and a JSON file envelope with hashes. It materially improves the prior P7/P8 blockers, but it is not fully aligned yet: the selector can still omit high-signal agentic files because of a priority inversion, the approval/onboarding state machine does not durably capture Firebase-user-creation failure, and steering remains limited to partial between-step abort checks.

Storico iterazioni

3 run precedenti su questo artefatto. Ogni run_id apre la sua readiness review.

QuandoScoreStatoRun ID
12 maggio 2026 (questa run)74 / CAlto rischio3f3bb587
12 mag 202674 / CAlto rischio659a695a
12 mag 202667 / CAlto rischioe476247c
12 mag 202630 / FAlto rischio86b2c59d

Findings per principio

10 principi valutati. Verdict, severity, evidenza e raccomandazione per ognuno.

P0

Richiede modificheBlocker di produzione50/100

Make hand-offs, approvals, and blockers explicit

The validation job handoffs are explicit through mark_blocked(), mark_failed(), mark_aborted(), and typed failure_kind values, and approve_cohort_application.approve() now records sign_in_link_failed, approval_email_failed, validation_queued, validation_complete, and validation_failed. But the first external approval handoff is still not durably protected: after committing onboarding_state='pending', approve() calls create_firebase_user_for_email(app.email, ...) without a try/except or typed failure transition. If Firebase creation fails or times out, the application can remain in pending with no onboarding_failure_reason or firebase_user_failed state. Delta vs prior baseline: the sign-in-li…

Raccomandazione

Move approval side effects behind a durable onboarding transition service/job, or at minimum wrap create_firebase_user_for_email() with a committed firebase_user_create_started/firebase_user_failed transition and reconciliation for the case where Firebase succeeds but the following database commit fails.

P0

Richiede modificheBlocker di produzione45/100

Establish trust through inspectability

The prior raw-concatenation blocker is materially improved by build_file_envelope(), which emits envelope_schema, boundary_contract, file path/byte_size/sha256/content records, and envelope_hash, and the audit stores commit_sha, bundle_sha256, envelope_schema, and envelope_hash. However, inspectability is still undermined by a concrete selector priority inversion: select_agentic_surface() sets priority = 50 for files without positive signal but priority = 99 for positive-signal files that do not match SORT_PRIORITY. INCLUDED_NAME_PATTERNS contains coordinator, planner, router, and evaluator, but SORT_PRIORITY omits them, so high-signal files with those names can sort behind generic fallback…

Raccomandazione

Simplify the selector boundary so every positive-signal file sorts before fallback files: either add coordinator/planner/router/evaluator to SORT_PRIORITY or make the unmatched-positive default priority lower than the fallback priority. Persist enough inventory or omitted-positive metadata in the audit so a reviewer can verify that the validation saw the relevant agentic surface.

P0

Richiede modificheHardening consigliato35/100

Optimise for steering, not only initiating

The code has partial steering primitives: CohortValidationJob.abort_requested exists, mark_step_started() checks it, _execute_with_job() refreshes the job before validation, and mark_aborted() records an aborted terminal state. But the submitted code does not expose a service-owned request_abort(job_id), retry_failed_job(job_id), or resume/interrupt command; retry_count is incremented only on a manual CLI re-run, and long operations such as fetch_public_repo() and asyncio.run(validate_code_against_principles()) cannot be interrupted once started. Delta vs prior baseline: this maintains the prior partial between-step abort support but does not address the prior recommendation for external ste…

Raccomandazione

Move steering authority into a small service/API outside the execution loop: provide request_abort, retry_failed_job, and resume/interruption commands, and make long-running clone/validate boundaries poll or honour cancellation checkpoints instead of requiring direct DB edits or full restarts.

P0

Allineato

Design for delegation rather than direct manipulation

The workflow is designed around delegated validation rather than manual step execution: approve_cohort_application.approve() gates approval, then run_cohort_validate.run() creates a CohortValidationJob and _execute_with_job() performs clone, language detection, file selection, bundling, LLM validation, and persistence. ValidationRequest.context records the delegated task, repository namespace, and selected files.

P0

Allineato

Ensure that background work remains perceptible

Background work is made durable and perceptible through CohortValidationJob.status, queued_at, cloning_started_at, selecting_started_at, bundling_started_at, validating_started_at, terminal_at, failure_kind, safe_display_message, retry_eligible, and abort_requested. create_job() commits a queued row before bridge work starts, and mark_step_started(), mark_completed(), mark_blocked(), mark_failed(), and mark_aborted() persist state transitions.

P0

Allineato

Align feedback with the user’s level of attention

Feedback is calibrated by separating user/input blockers from internal/transient failures: mark_blocked() sets status='blocked' and retry_eligible=False for invalid_repo_url, no_supported_language, selector_rejected, and read_failed, while mark_failed() marks retryable validate_error/unexpected_error paths. _summarize_validate_log() captures only log shape rather than leaking noisy content, and onboarding_failure_reason is set only on approval-side failures requiring attention.

P0

Allineato

Apply progressive disclosure to system agency

The primary operational surface is concise status/failure state, while deeper inspection is available in the persisted audit object. audit_object separates source, selection, validate, and job sections; _collect_job_transitions() provides timeline detail; _summarize_validate_log() exposes top-level log shape without dumping raw validator internals.

P0

Allineato

Replace implied magic with clear mental models

The code exposes a clear mental model through explicit constants and state vocabularies: STEPS=('cloning','selecting','bundling','validating'), TERMINAL_STATUSES, FAILURE_KINDS, and onboarding_state values such as pending, approved, validation_queued, validation_complete, and validation_failed. The file boundary is also explicit via envelope_schema='bridge.files.v2', boundary_contract='untrusted_user_code', and BOUNDARY_CONTRACT_VERSION.

P0

Allineato

Expose meaningful operational state, not internal complexity

The workflow exposes meaningful operational states instead of raw internals: job.status uses queued/cloning/selecting/bundling/validating/completed/blocked/failed/aborted, while safe_display_message gives user-relevant explanations such as invalid repo URL, no supported language, clone failure, or validation error. Technical details like selected file hashes, latency_ms, and log_signals are reserved for the audit payload.

P0

Allineato

Represent delegated work as a system, not merely as a conversation

Delegated work is represented as a structured system rather than a conversation: CohortValidationJob stores lifecycle state and timestamps, _execute_with_job() advances explicit steps, and the persisted audit object separates source, selection, validate, and job transition data. The validator result is stored in UserValidationRun with repository, latency, score/grade/tier, and audit metadata rather than only a free-form message stream.

Aggiungi al tuo README

Due varianti embeddabili: una piccola e una a card più ricca.

Score card (consigliata)

Blueprint Readiness Score card
[![Blueprint Readiness Score card](https://aidesignblueprint.com/api/badge/run/3f3bb587-5ad1-4417-b624-23441b39831e/card.svg)](https://aidesignblueprint.com/en/readiness-review/3f3bb587-5ad1-4417-b624-23441b39831e)

Badge piatto

Blueprint Readiness Score badge
[![Blueprint Readiness Score](https://aidesignblueprint.com/api/badge/run/3f3bb587-5ad1-4417-b624-23441b39831e.svg)](https://aidesignblueprint.com/en/readiness-review/3f3bb587-5ad1-4417-b624-23441b39831e)
Dettagli baseline e iterazione
Baseline: usedDoctrine: same doctrineRace: checked clear

Delta iterazione

Miglioramenti (2)

P7Establish trust through inspectabilityneeds_changesneeds_changes
P8Make hand-offs, approvals, and blockers explicitneeds_changesneeds_changes
Rubric: 2026-05-04Voto limitato da 0 finding ad alto rischio

Run ID: 3f3bb587-5ad1-4417-b624-23441b39831e · Results expire after 90 days

Run by agents. Governed by humans. Validated by the AI Design Blueprint.