Agent Architecture Review, Validation snapshot

Evaluated 4 May 2026 against the AI Design Blueprint doctrine

Production-ready

Status: Needs Changes

86/100

Grade B

5 aligned4 not applicable

The prior-run baseline injection design is largely strong for a synchronous validation component: it has typed parse boundaries, catalog-whitelisted inert JSON rendering, doctrine compatibility metadata, race status, parse status, reproducibility anchoring, and structured regression/improvement surfaces. The main applicable gap is a mental-model issue around soft deletion: user-facing delete routes hide runs from the user while the public validation URL deliberately remains alive, but that distinction is not exposed as a clear product state or separate control.

Per-principle findings

10 principles evaluated. Verdict, severity, evidence and recommendation for each.

Replace implied magic with clear mental models

needs changes38/100

The prior-baseline behavior is mostly explicit through `baseline_status`, `baseline_parse_reason`, `baseline_compatibility`, `baseline_race_status`, and the `_SYSTEM_PROMPT` scoring contract. However, the delete semantics create a user mental-model gap: `delete_validation_run` returns `204 No Content` after setting `row.is_user_deleted = True`, while `get_validation_run` explicitly says the public surface deliberately ignores `is_user_deleted` so case-study links remain alive. That retention rule is only an internal docstring/comment, not a user-visible state or separate operation, so a user can reasonably infer that deletion removes the public artifact when it does not.

Recommendation

Separate and expose the retention semantics explicitly: make `/me/validation-runs` deletion read as “remove from my account/history,” add a distinct unpublish/revoke-public-artifact control if public visibility can be withdrawn, and surface a user-visible `public_url_active` or equivalent state instead of relying on an internal comment.

Apply progressive disclosure to system agency

aligned

The response model separates high-level result surfaces from inspection details: `assessment`, `readiness`, `freshness`, and `outdated_pattern_signal` sit alongside deeper structures such as `matched_principles`, `recommended_examples`, `reproducibility`, `processing`, `regressions`, and `improvements`. The prior-baseline rendering also keeps the LLM payload minimized: `_render_prior_baseline_section` omits the prior run id while the handler adds `prior_run_id` only under `result["reproducibility"]` for caller-side audit.

Expose meaningful operational state, not internal complexity

aligned

The baseline feature exposes actionable operational states rather than raw internals. `baseline_status` is constrained to five user-relevant outcomes (`used`, `none_found`, `skipped_private_session`, `unavailable_db_error`, `unavailable_corrupt_payload`); `baseline_parse_reason` distinguishes `parsed`, `missing_payload`, `corrupt_json`, `shape_mismatch`, `construction_error`, and `no_lookup`; and `baseline_race_status` distinguishes `checked_clear`, `race_detected`, and `not_checked`. The DB lookup and race-check branches map `SQLAlchemyError` and concurrent-run changes into these states instead of leaking stack mechanics.

Establish trust through inspectability

aligned

The code contains several real inspectability primitives. `PriorRunParseResult` creates a typed parse boundary; `_render_prior_baseline_section` emits a deterministic JSON payload via `json.dumps(..., sort_keys=True)`; prior findings are sorted by `principle_id`; slug/title are taken from the trusted current catalog rather than from the stored prior run; doctrine drift is surfaced through `prior_doctrine_fingerprint`, `current_doctrine_fingerprint`, and `baseline_compatibility`; dropped catalog ids are exposed through `dropped_baseline_principle_ids`; and the durable anchor is preserved as `result["reproducibility"]["prior_run_id"]`. The submitted code/context also contains self-certifying s…

Make hand-offs, approvals, and blockers explicit

aligned

Where this synchronous component can encounter blockers, it makes them explicit. Baseline lookup failures become `baseline_status = "unavailable_db_error"`; malformed stored payloads become `"unavailable_corrupt_payload"` plus a specific `baseline_parse_reason`; private sessions become `"skipped_private_session"`; absent prior runs become `"none_found"`; and concurrent baseline changes become `baseline_race_status = "race_detected"`. Delete routes also fail closed with `HTTPException(404, "Not found.")` when the row is absent, belongs to another user, or is already user-deleted.

Represent delegated work as a system, not merely as a conversation

aligned

The feature represents validation history as structured system state rather than as conversational memory. `PriorRunSummary` carries prior score, grade, assessment status, doctrine fingerprint, and normalized findings; `compute_principle_deltas` is imported for structured `regressions`/`improvements`; `baseline_compatibility` and `dropped_baseline_principle_ids` make catalog drift explicit; `baseline_anchor_run_id` and `baseline_race_status` guard against concurrent-run ambiguity; and `UserValidationRun.is_user_deleted` separates private listing state from public artifact continuity.

Design for delegation rather than direct manipulation

not applicable

This component does not itself implement a delegated-work experience; it is a request/response validation-support surface around prior-run parsing, prompt rendering, response shaping, and validation-run deletion. There are no delegated task controls, authority scopes, background run controls, or autonomous execution lifecycle primitives in the shown code, so P1 is architecturally outside this code slice.

Ensure that background work remains perceptible

not applicable

The shown paths are synchronous database/query/response operations: `parse_prior_run`, `_render_prior_baseline_section`, `delete_validation_run`, `delete_validation_run_project`, and `get_validation_run`. Although the schema includes `processing`, this submission does not show background execution, queued work, or a persistent worker whose progress must remain perceptible.

Align feedback with the user’s level of attention

not applicable

The code does not implement foreground/background attention modes, notifications, monitoring levels, or escalation cadence. It returns structured response fields synchronously, but there is no user-attention state machine in this component to assess under P3.

Optimise for steering, not only initiating

not applicable

This code does not implement an in-flight autonomous workflow that a user can steer while it runs. It supports iteration across validation runs through prior-baseline context and deltas, but there are no mid-process controls such as pause, resume, abort, reprioritize, or constraint-update hooks in the shown synchronous request/response component.

Embed in your README

Two embeddable variants: a small flat shield and a richer score card.

Score card (recommended)

[![Blueprint Readiness Score card](https://aidesignblueprint.com/api/badge/run/6d70eb0c-ec05-4ed4-b882-e30d8ca400de/card.svg)](https://aidesignblueprint.com/en/readiness-review/6d70eb0c-ec05-4ed4-b882-e30d8ca400de)

Flat badge

[![Blueprint Readiness Score](https://aidesignblueprint.com/api/badge/run/6d70eb0c-ec05-4ed4-b882-e30d8ca400de.svg)](https://aidesignblueprint.com/en/readiness-review/6d70eb0c-ec05-4ed4-b882-e30d8ca400de)

Baseline and iteration details

Rubric: 2026-05

Run your own validation AI Design Blueprint

Run ID: 6d70eb0c-ec05-4ed4-b882-e30d8ca400de · Results expire after 90 days

Run by agents. Governed by humans. Validated by the AI Design Blueprint.