Vai al contenuto principaleVai al footer
Revisionato

Valutazione completata; in attesa di revisione delle prove.

Agent Architecture Review, Snapshot di validazione

Valutato il 12 maggio 2026 rispetto alla doctrine di AI Design Blueprint

Emergente

Stato: Alto rischio

67/100

Voto C

6 allineati3 blocker produzione1 hardening
Verdetti per principio

The submission materially improves the cohort bridge with a durable `CohortValidationJob` state machine, committed step transitions, typed failure fields, and an audit envelope. However, production trust boundaries still fail in two places: arbitrary repo content is still passed to the validator as a raw same-channel text bundle with only a natural-language header, and several blocker/error paths can be misclassified or lose their durable failure transition.

Storico iterazioni

1 run precedenti su questo artefatto. Ogni run_id apre la sua readiness review.

QuandoScoreStatoRun ID
12 maggio 2026 (questa run)67 / CAlto rischioe476247c
12 mag 202630 / FAlto rischio86b2c59d

Findings per principio

10 principi valutati. Verdict, severity, evidenza e raccomandazione per ognuno.

P0

Richiede modificheBlocker di produzione75/100

Make hand-offs, approvals, and blockers explicit

The workflow now has explicit handoff helpers (`mark_blocked`, `mark_failed`, `mark_completed`, `mark_aborted`), but several blockers can still be mishandled. The `invalid_url`/`invalid_repo_url` mismatch can turn an applicant-fixable URL issue into `unexpected_error`. In the persistence block, `except Exception` calls `mark_failed()` on the same SQLAlchemy session after a possible `db.flush()` or `db.commit()` failure without first rolling back, so the failure transition itself can fail and leave the job stuck. Similarly, if `mark_completed()` mutates `job.status='completed'` and its commit fails, the outer handler may see the in-memory terminal status and skip `mark_failed()`. Read failure…

Raccomandazione

Separate result persistence from failure-state persistence with rollback-safe transaction boundaries: call `db.rollback()` before marking `persist_error`, make terminal transition commits atomic and verifiable, normalize all external blocker kinds before `mark_blocked()`, and convert unrecoverable read/bundle failures into `blocked` states before validator invocation.

P0

Richiede modificheBlocker di produzione65/100

Establish trust through inspectability

The audit chain is strong: `commit_sha`, per-file `content_sha256`, `bundle_sha256`, `schema_version='bridge.audit.v1.1'`, and `boundary_contract_version` make the result traceable. The remaining trust-boundary issue is that `wrap_bundle_with_boundary(raw_bundle)` prepends `BOUNDARY_HEADER` into the same `code` string that contains arbitrary repo content, and `_build_implementation_context()` concatenates raw `# === FILE: {path} ===` headers plus unescaped file contents. A natural-language header inside the payload is helpful, but it is not the same as a trusted validation-service envelope; malicious file paths or contents still occupy the same instruction channel. This partially addresses t…

Raccomandazione

Move the inertness contract to the validation service’s trusted prompt/schema boundary: pass files as a typed JSON/file array with encoded paths and content, persist the envelope version/hash, and treat the rendered bundle only as an inspectable artifact rather than the authority boundary.

P0

Richiede modificheBlocker di produzione60/100

Expose meaningful operational state, not internal complexity

The structured status model is a major improvement, but some current paths can still expose misleading operational state. `fetch_public_repo()` raises `FetchError(..., kind='invalid_url')`, while `FAILURE_KINDS` contains `invalid_repo_url` rather than `invalid_url`; `_execute_with_job()` may pass `invalid_url` into `mark_blocked()`, causing a `ValueError` and eventual `unexpected_error` instead of a user-actionable URL blocker. `_build_implementation_context()` also records `skipped_during_read` but the workflow can still proceed to validation and `mark_completed()` even if selected files failed to read, so the job may say completed when the validation input was materially incomplete. This i…

Raccomandazione

Make failure taxonomy a single shared typed primitive across fetcher and job state, map `invalid_url` to `invalid_repo_url`, and block the job when the post-read bundle has no successfully read source files or falls below the minimum useful source threshold.

P0

Richiede modificheHardening consigliato45/100

Optimise for steering, not only initiating

`abort_requested` and `mark_aborted()` are the right primitive direction, and `mark_step_started()` checks for aborts between steps. The submitted code does not show an operator-safe mutation surface such as `request_abort(job_id)` or `retry_job(job_id)`, `retry_count` is defined but never incremented, and there is no row refresh/lock before checking `job.abort_requested`, so an external abort may depend on SQLAlchemy session-expiration behavior. The current implementation improves the prior lack of steering, but it is not yet a complete steering surface.

Raccomandazione

Add a small service-owned steering API/CLI outside the execution loop for `request_abort`, `retry_failed_job`, and `mark_interrupted`; refresh or lock the job row before each irreversible/external step, and increment/link retries rather than relying on manual reruns.

P0

Allineato

Design for delegation rather than direct manipulation

`approve()` lets the founder approve an application once, then delegates the repo scan to `run_cohort_validate(app.id)`. `_execute_with_job` owns the operational sequence — parse repo, clone, select, bundle, validate, persist — and `ValidationRequest` carries the task, repository namespace, and selected files rather than requiring the operator to perform each step manually. This maintains the prior aligned result.

P0

Allineato

Ensure that background work remains perceptible

`create_job()` inserts and commits a `CohortValidationJob` at `status='queued'` before clone or validation begins. `mark_step_started()` records `<step>_started_at`, updates `status`, and commits for `cloning`, `selecting`, `bundling`, and `validating`; terminal helpers persist `completed`, `blocked`, `failed`, or `aborted`. The audit object also records `job_id` and `_collect_job_transitions(job)`. This addresses the prior blocker around no durable run row before clone/validation.

P0

Allineato

Align feedback with the user’s level of attention

Foreground output is concise (`Created CohortValidationJob#...`, terminal status, failure kind), while durable detail lives in `safe_display_message`, `failure_kind`, timestamps, and the audit object. `_summarize_validate_log()` deliberately stores structural log shape rather than dumping full content, which keeps routine operation quiet while preserving escalation detail. This improves the prior feedback/lifecycle gap.

P0

Allineato

Apply progressive disclosure to system agency

The primary operational layer is the small job state surface (`status`, `failure_kind`, `safe_display_message`, `retry_eligible`), while deeper inspection is available through `result_json['audit']`, including source commit, selected file hashes, bundle hash, validator latency, log-signal summary, and job transitions. This preserves progressive disclosure rather than exposing raw logs as the default view.

P0

Allineato

Replace implied magic with clear mental models

The code gives operators a clear model of what the workflow does and cannot do: `cohort_validation_job.py` documents the state machine and the distinction between `blocked`, `failed`, and `aborted`; `agentic_surface_selector.py` documents that include patterns are sort preferences rather than gates; `FetchError.kind`, `FAILURE_KINDS`, and `retry_eligible` make dependencies and recovery expectations explicit.

P0

Allineato

Represent delegated work as a system, not merely as a conversation

`CohortValidationJob` represents the delegated work as a system with durable state, timestamps, terminal statuses, retry metadata, abort metadata, linkage to `UserValidationRun`, and audit quick-access fields. `_collect_job_transitions()` mirrors the lifecycle into the persisted audit object, separating execution state from the founder CLI’s conversational/console output. This addresses the prior recommendation to represent the bridge as a service-owned validation job.

Aggiungi al tuo README

Due varianti embeddabili: una piccola e una a card più ricca.

Score card (consigliata)

Blueprint Readiness Score card
[![Blueprint Readiness Score card](https://aidesignblueprint.com/api/badge/run/e476247c-d9c0-44f4-a25f-dbbdb7eb7b15/card.svg)](https://aidesignblueprint.com/en/readiness-review/e476247c-d9c0-44f4-a25f-dbbdb7eb7b15)

Badge piatto

Blueprint Readiness Score badge
[![Blueprint Readiness Score](https://aidesignblueprint.com/api/badge/run/e476247c-d9c0-44f4-a25f-dbbdb7eb7b15.svg)](https://aidesignblueprint.com/en/readiness-review/e476247c-d9c0-44f4-a25f-dbbdb7eb7b15)
Dettagli baseline e iterazione
Baseline: usedDoctrine: same doctrineRace: checked clear

Delta iterazione

Miglioramenti (7)

P2Ensure that background work remains perceptibleneeds_changesaligned
P3Align feedback with the user’s level of attentionneeds_changesaligned
P6Expose meaningful operational state, not internal complexityneeds_changesneeds_changes
P7Establish trust through inspectabilityneeds_changesneeds_changes
P8Make hand-offs, approvals, and blockers explicitneeds_changesneeds_changes
P9Represent delegated work as a system, not merely as a conversationneeds_changesaligned
P10Optimise for steering, not only initiatingneeds_changesneeds_changes
Rubric: 2026-05-04

Run ID: e476247c-d9c0-44f4-a25f-dbbdb7eb7b15 · Results expire after 90 days

Run by agents. Governed by humans. Validated by the AI Design Blueprint.