Vai al contenuto principaleVai al footer
Revisionato

Valutazione completata; in attesa di revisione delle prove.

Agent Architecture Review, Snapshot di validazione

Valutato il 12 maggio 2026 rispetto alla doctrine di AI Design Blueprint

Emergente

Stato: Alto rischio

74/100

Voto C

8 allineati1 blocker produzione1 hardening
Verdetti per principio

The submission is an autonomous multi-step validation workflow with strong primitives for durable job state, typed untrusted-code envelopes, file/content hashing, and auditability. Iter6 materially fixes the prior selector inspectability bug and the Firebase-create failure transition, but one production-relevant handoff still can stall silently: approval sets validation_queued before run_cohort_validate may return without creating a job when OPENAI_API_KEY is missing. Steering also remains a next-iteration hardening gap rather than a production blocker.

Storico iterazioni

4 run precedenti su questo artefatto. Ogni run_id apre la sua readiness review.

QuandoScoreStatoRun ID
12 maggio 2026 (questa run)74 / CAlto rischio14a3456f
12 mag 202674 / CAlto rischio3f3bb587
12 mag 202674 / CAlto rischio659a695a
12 mag 202667 / CAlto rischioe476247c
12 mag 202630 / FAlto rischio86b2c59d

Findings per principio

10 principi valutati. Verdict, severity, evidenza e raccomandazione per ognuno.

P0

Richiede modificheBlocker di produzione45/100

Make hand-offs, approvals, and blockers explicit

Most handoffs and blockers are explicit: approve() prompts unless --yes is passed, mark_blocked()/mark_failed() persist failure_kind and safe_display_message, and the prior bare create_firebase_user_for_email() call is now wrapped with onboarding_state='firebase_user_failed'. Delta: this addresses the prior Firebase-create failure path. However, approve() sets app.onboarding_state='validation_queued' before calling run_cohort_validate(app.id), while run() returns 2 before create_job() when OPENAI_API_KEY is missing and not dry_run. approve() ignores that exit_code; if latest_job is None, the application remains validation_queued with no onboarding_failure_reason or job-level blocker.

Raccomandazione

Move the validation kickoff behind a durable transition boundary: validate required runtime configuration before setting validation_queued, or create a CohortValidationJob and mark it blocked/failed for missing validator credentials. Also handle nonzero run_cohort_validate exit_code and the no-latest_job case by setting onboarding_state='validation_failed' with a safe failure reason.

P0

Richiede modificheHardening consigliato35/100

Optimise for steering, not only initiating

The code has some steering primitives: CohortValidationJob.abort_requested exists, mark_step_started() aborts before advancing if the flag is set, _execute_with_job() refreshes the job and checks abort_requested before validating, and retry_count is tracked. But the submitted code does not expose a request_abort/retry/resume service or command, and cancellation is not observed inside long boundaries such as fetch_public_repo() or validate_code_against_principles(); steering is effectively limited to manual DB mutation and step boundaries. Delta: this maintains the prior P10 hardening finding.

Raccomandazione

Expose a small steering service/API outside the execution loop with request_abort, retry_failed_job, and resume/requeue commands, and have clone/validate boundaries observe persisted cancellation at safe checkpoints. Keep it as a state primitive rather than adding wrapper orchestration around the current CLI.

P0

Allineato

Design for delegation rather than direct manipulation

The workflow is designed around delegated work rather than manual execution: approve() delegates repository validation after an application approval, create_job() persists application_id/user_id/repo_url, and _execute_with_job() carries out cloning → selecting → bundling → validating. Scope and constraints are explicit through repo_url, ValidationRequest(context=ValidationContext(...)), MAX_FILES, MAX_BUNDLE_BYTES, MAX_FILE_BYTES, and the selected file list.

P0

Allineato

Ensure that background work remains perceptible

Background work is made perceptible through CohortValidationJob.status plus queued_at, cloning_started_at, selecting_started_at, bundling_started_at, validating_started_at, terminal_at, failure_kind, retry_eligible, and abort_requested. mark_step_started(), mark_completed(), mark_blocked(), mark_failed(), and mark_aborted() commit durable transitions so the workflow can be inspected after the initiating CLI/process moves on.

P0

Allineato

Align feedback with the user’s level of attention

Feedback is calibrated by audience and attention level: CohortApplication.onboarding_state gives a coarse applicant/founder-facing state, CohortValidationJob.failure_kind and safe_display_message expose actionable blocker/failure information, and audit_object plus _summarize_validate_log() preserve deeper diagnostic detail without forcing it into the primary status path.

P0

Allineato

Apply progressive disclosure to system agency

The code separates primary status from inspection detail. Primary fields include onboarding_state, job.status, safe_display_message, selected_count, skipped_count, commit_sha, and bundle_sha256; detailed evidence is available in audit_object with source, selection, validate, and job sections, and in the typed file envelope built by build_file_envelope().

P0

Allineato

Replace implied magic with clear mental models

The workflow provides clear mental models through explicit state-machine comments, STEPS, TERMINAL_STATUSES, FAILURE_KINDS, onboarding_state values, BOUNDARY_CONTRACT_VERSION, ENVELOPE_SCHEMA, BOUNDARY_HEADER, and ENVELOPE_ADVISORY. The code distinguishes approval, validation queueing, blocked validation, failed validation, and completed validation rather than presenting the validator as opaque magic.

P0

Allineato

Expose meaningful operational state, not internal complexity

Operational state is expressed in user-relevant terms such as queued, cloning, selecting, bundling, validating, completed, blocked, failed, aborted, validation_queued, validation_complete, and validation_failed. Low-level details like log_context are summarized by _summarize_validate_log(), while safe_display_message gives a cleaner operator-facing explanation.

P0

Allineato

Establish trust through inspectability

Inspectability is supported by load-bearing primitives: RepoSnapshot.commit_sha captures git rev-parse HEAD, _build_implementation_context() records per-file sha256 values, build_file_envelope() creates an envelope_hash over canonical JSON, and audit_object records source, selected files, skipped reads, bundle_sha256, envelope_schema, envelope_hash, latency, and job transitions. The selector is now deterministic and fixes the prior priority inversion: SORT_PRIORITY includes coordinator/planner/router/evaluator and unmatched positive-signal files default to priority 12, ahead of fallback priority 50. Delta: this improves the prior P7 needs_changes finding by addressing the cited selector bug.

P0

Allineato

Represent delegated work as a system, not merely as a conversation

Delegated work is represented as a structured system, not a message stream. CohortValidationJob models the job lifecycle, cohort_validation_job.py defines explicit state transitions, _collect_job_transitions() records the timeline, and audit_object preserves source, selection, validation, and job substructures for later review.

Aggiungi al tuo README

Due varianti embeddabili: una piccola e una a card più ricca.

Score card (consigliata)

Blueprint Readiness Score card
[![Blueprint Readiness Score card](https://aidesignblueprint.com/api/badge/run/14a3456f-44db-448a-a47b-5e9637944ea6/card.svg)](https://aidesignblueprint.com/en/readiness-review/14a3456f-44db-448a-a47b-5e9637944ea6)

Badge piatto

Blueprint Readiness Score badge
[![Blueprint Readiness Score](https://aidesignblueprint.com/api/badge/run/14a3456f-44db-448a-a47b-5e9637944ea6.svg)](https://aidesignblueprint.com/en/readiness-review/14a3456f-44db-448a-a47b-5e9637944ea6)
Dettagli baseline e iterazione
Baseline: usedDoctrine: same doctrineRace: checked clear

Delta iterazione

Miglioramenti (2)

P7Establish trust through inspectabilityneeds_changesaligned
P8Make hand-offs, approvals, and blockers explicitneeds_changesneeds_changes
Rubric: 2026-05-04Voto limitato da 0 finding ad alto rischio

Run ID: 14a3456f-44db-448a-a47b-5e9637944ea6 · Results expire after 90 days

Run by agents. Governed by humans. Validated by the AI Design Blueprint.