Skip to main contentSkip to footer
Reviewed

Assessment complete; awaiting evidence revision.

Agent Architecture Review, Validation snapshot

Evaluated 14 May 2026 against the AI Design Blueprint doctrine

High Risk

Status: High Risk

58/100

Grade D

5 aligned4 production blockers1 hardening
Per-principle verdicts

The A2A executor uses real governance primitives—task state, INPUT_REQUIRED approval, WORKING updates, cancellation, and completion artifacts—but the destructive-action reference is not yet doctrine-aligned because the approval is not bound to a concrete target, accepts ambiguous confirmations, and reports validation/deletion work that the code does not actually perform. The stdio proxy is mostly a synchronous bridge, but it also lacks meaningful call-level inspectability if used as part of governed workflows.

Per-principle findings

10 principles evaluated. Verdict, severity, evidence and recommendation for each.

P0

needs changesproduction blocker80/100

Make hand-offs, approvals, and blockers explicit

The handoff is explicit at the protocol level because the first call emits `TASK_STATE_INPUT_REQUIRED` with a concrete instruction to reply `confirm`, and the abort path emits `TASK_STATE_CANCELED`. However, the approval gate is unsafe: `if "confirm" not in user_input.lower()` means phrases like `do not confirm` or `confirm nothing` proceed, and the approval message is not bound to a specific file path or operation instance.

Recommendation

Use an exact structured approval decision rather than substring matching, reject ambiguous responses, and bind the pending approval to the concrete target and operation so the resumed task can only execute what the user approved.

P0

needs changesproduction blocker70/100

Replace implied magic with clear mental models

The mental model is materially misleading. The approval text says `This will permanently delete the requested file`, the progress message says `Validating target path...`, and the result says `File deleted successfully.`, but the code contains no parsed target path, no validation primitive, and no deletion call. Users or implementers could believe the example demonstrates a complete governed deletion when it is only emitting status text.

Recommendation

Either make the sample explicitly non-destructive by wording the result as a simulation, or implement the real primitives behind the claims: parse/display the target, validate it, perform or mock the deletion explicitly, and report the actual outcome.

P0

needs changesproduction blocker65/100

Establish trust through inspectability

The workflow does not provide an inspectable production trace for an accountability-sensitive action. `TaskArtifactUpdateEvent` only contains `new_text_artifact(name="result", text="File deleted successfully.")`; there is no audit record of which file was requested, what validation occurred, who confirmed, what exact confirmation was accepted, or whether the final operation actually changed anything. The MCP proxy similarly forwards `_call_tool()` to `client.call_tool(name, arguments or {})` and returns only `result.content`, with no call correlation or audit envelope in this code.

Recommendation

Move auditability into a durable task/event ledger or structured artifact: record task id, actor, requested target, validation outcome, approval text/decision, executed operation, remote tool name where relevant, and final result.

P0

needs changesproduction blocker50/100

Design for delegation rather than direct manipulation

The executor is structured around delegated work rather than manual steps: `GovernedFileAgent.execute()` creates or resumes an A2A task, emits `TASK_STATE_WORKING`, pauses for approval, resumes, and completes with an artifact. However, delegated authority is still represented only as free text from `context.get_user_input()` / `context.message`; there is no structured target file, constraint envelope, or explicit scope of authority attached to the task before asking the user to confirm deletion.

Recommendation

Represent the delegated job as structured task state: target path, requested operation, allowed scope, and approval status should be explicit fields or durable task metadata before execution resumes, not inferred from conversation text alone.

P0

needs changeshardening recommended35/100

Apply progressive disclosure to system agency

The default status messages are concise, but there is no deeper inspection layer when confidence or intervention matters. The workflow emits only generic messages such as `Validating target path...`, `Executing file deletion...`, and an artifact with `File deleted successfully.`; it does not expose the target path, validation result, approval record, or action details as inspectable task detail.

Recommendation

Keep the primary status concise, but add an inspectable detail artifact or task metadata containing the concrete target, validation checks, approval decision, and final operation result.

P0

aligned

Ensure that background work remains perceptible

Background work is made perceptible through A2A task events: the code enqueues the task, emits `TaskStatusUpdateEvent` with `TASK_STATE_WORKING` for validation and execution, emits a `TaskArtifactUpdateEvent` named `result`, and finishes with `TASK_STATE_COMPLETED`. The use of `task_id`, `context_id`, and `context.current_task` gives the protocol a continuity model for the pause/resume flow.

P0

aligned

Align feedback with the user’s level of attention

Feedback is proportionate for this short workflow: the user gets a high-salience `TASK_STATE_INPUT_REQUIRED` message only when approval is needed, concise `TASK_STATE_WORKING` updates during routine progress, a clear cancel message on non-confirmation, and a terminal completion state. The code does not expose excessive internal mechanics during routine execution.

P0

aligned

Expose meaningful operational state, not internal complexity

The A2A executor exposes meaningful operational states instead of internal complexity: `TASK_STATE_WORKING`, `TASK_STATE_INPUT_REQUIRED`, `TASK_STATE_CANCELED`, and `TASK_STATE_COMPLETED` are paired with user-relevant messages like `Confirmed. Validating target path...` and `Action aborted — no confirmation received. File was not modified.` It does not leak stack traces or low-level SDK mechanics into the task messages.

P0

aligned

Represent delegated work as a system, not merely as a conversation

The A2A executor represents the work as task state rather than only as a chat transcript. It creates or resumes a task with `new_task_from_user_message(context.message)`, uses `context.task_id` and `context.context_id`, separates status updates from the final artifact, and distinguishes paused, working, canceled, and completed states.

P0

aligned

Optimise for steering, not only initiating

The workflow supports steering after initiation for the scope of this short destructive task. It pauses for approval on the first call, resumes based on the user response, treats non-confirmation as an abort, and implements `cancel()` to emit `TASK_STATE_CANCELED` with `Cancelled by operator before execution — file was not modified.`

Embed in your README

Two embeddable variants: a small flat shield and a richer score card.

Score card (recommended)

Blueprint Readiness Score card
[![Blueprint Readiness Score card](https://aidesignblueprint.com/api/badge/run/ca187db7-82d3-41eb-8c2d-57890d954fa7/card.svg)](https://aidesignblueprint.com/en/readiness-review/ca187db7-82d3-41eb-8c2d-57890d954fa7)

Flat badge

Blueprint Readiness Score badge
[![Blueprint Readiness Score](https://aidesignblueprint.com/api/badge/run/ca187db7-82d3-41eb-8c2d-57890d954fa7.svg)](https://aidesignblueprint.com/en/readiness-review/ca187db7-82d3-41eb-8c2d-57890d954fa7)
Baseline and iteration details
Rubric: 2026-05-04

Run ID: ca187db7-82d3-41eb-8c2d-57890d954fa7 · Results expire after 90 days

Run by agents. Governed by humans. Validated by the AI Design Blueprint.