Skip to main contentSkip to footer
Reviewed

Assessment complete; awaiting evidence revision.

Agent Architecture Review, Validation snapshot

Evaluated 12 May 2026 against the AI Design Blueprint doctrine

High Risk

Status: High Risk

22/100

Grade F

0 aligned7 production blockers3 hardening1 high risk
Per-principle verdicts

The submitted SDK glue layer is an autonomous agentic workflow: it streams Claude Agent SDK sessions with broad tools, executes event-driven listener scripts, exposes email mutation helpers, and manages action/component/state extensions. It has useful local-demo primitives such as listener/action logs and UI state persistence, but the production trust boundaries do not hold: untrusted email bodies are fed to an agent with Bash/Edit/Write/Web tools, automated listeners can mutate email state without an approval break, and long-running work lacks persistent run lifecycle, steering, and auditable correlation.

Per-principle findings

10 principles evaluated. Verdict, severity, evidence and recommendation for each.

P0

high riskproduction blocker95/100

Make hand-offs, approvals, and blockers explicit

Approval and blocker handling are the critical failing boundary. Automatic listener handlers receive `archiveEmail`, `starEmail`, `markAsRead`, and `addLabel` in `createContext()` and can mutate mailbox state without a hard `awaiting_approval` break. `AIClient.defaultOptions.allowedTools` permits `Bash`, `Edit`, `Write`, `WebFetch`, and `Task`; the `PreToolUse` hook only blocks `.js`/`.ts` writes outside `agent/custom_scripts`, so other writes, shell commands, and network actions can proceed. `custom-tools.ts` injects untrusted email bodies into the agent context, creating a prompt-injection path from external email content to powerful tools. Blockers are mostly `console.error` or text error…

Recommendation

Insert hard approval/state breaks before every externally visible or destructive side effect, including email mutations, file writes, shell commands, and network actions. Treat email bodies as untrusted input, require policy evaluation before tool use, and persist blocked/awaiting-approval states with the specific user action needed.

P0

needs changesproduction blocker75/100

Optimise for steering, not only initiating

The workflow can be initiated and streamed, but it is not steerable while underway. `AIClient.queryStream` passes options such as `maxTurns` and `resume`, but there is no abort controller, pause/resume command, dynamic constraint update, or side-effect checkpoint in this layer. `ListenersManager.checkEvent` runs matching listeners to completion once triggered, and email mutation helpers do not check for cancellation or changed policy before acting. `ActionsManager.executeAction` executes a handler once and returns a result; retry, rollback, reprioritisation, and mid-run correction are not represented.

Recommendation

Add a persistent execution controller with pause, resume, abort, retry, and rollback transitions that are checked before each tool call or email mutation; steering should update the active run’s policy and state rather than forcing the user to restart.

P0

needs changesproduction blocker70/100

Establish trust through inspectability

The code has partial inspectability through `LogWriter.appendLog`, `ActionsManager.logExecution`, `UIStateManager.logStateUpdate`, and `search_inbox` log files, but the audit boundary is insufficient for accountable autonomous work. There is no persistent `run_id` correlating `AIClient.queryStream` messages, listener invocations, email mutations, UI state updates, and action executions. `callAgent()` does not record the prompt, model, schema, response, or tool-use rationale; listener log writes are fire-and-forget; and JSONL files are not tamper-evident. `custom-tools.ts` also writes full formatted email results, including `body`, to local log files, which improves traceability but creates a…

Recommendation

Move audit authority out of the execution loop into an append-only ledger keyed by `run_id` and step IDs; record prompts, model/version, tool calls, source email IDs, approval decisions, mutations, result diffs, and sanitized evidence with tamper-evident integrity checks.

P0

needs changesproduction blocker65/100

Design for delegation rather than direct manipulation

The code supports delegation mechanically through `ListenersManager.checkEvent`, `ActionsManager.executeAction`, and the prompt’s listener/action creation model, but delegated authority is not represented as a governed scope. `createContext()` hands every enabled listener helpers such as `archiveEmail`, `starEmail`, `markAsRead`, `addLabel`, `callAgent`, and `uiState.set` without a persistent authority envelope, per-listener permission scope, or lifecycle controls beyond `config.enabled`. `AIClient.defaultOptions.allowedTools` also grants broad tools including `Task`, `Bash`, `Edit`, `Write`, `WebFetch`, and `WebSearch` without tying them to user-stated constraints.

Recommendation

Move delegated authority into a persistent policy/permission layer: each listener, action, and agent run should have an explicit scope, allowed side effects, expiry, owner, and pause/stop controls before execution context tools are exposed.

P0

needs changesproduction blocker60/100

Ensure that background work remains perceptible

Background work is only partially perceptible. `ListenersManager` writes a `ListenerLogEntry` after a handler finishes and optionally calls `logBroadcastCallback`, while `ActionsManager.logExecution` appends JSONL logs, but there is no persistent run record showing `queued`, `active`, `blocked`, `awaiting approval`, or `failed` while work is in progress. `this.logWriter.appendLog(...).catch(...)` is fire-and-forget, so audit/status failures are not part of task state. `MessageQueue.close()` sets `closed = true` and drops `resolvers` without resolving pending `next()` calls, which can strand waiters silently.

Recommendation

Represent every listener/action/agent invocation as a durable run record with a status lifecycle and progress timestamps; make log-write failures part of that lifecycle, and change `MessageQueue.close()` to resolve pending waiters with `{ done: true }` or an explicit abort result.

P0

needs changesproduction blocker60/100

Replace implied magic with clear mental models

The prompt in `EMAIL_AGENT_PROMPT` gives a helpful high-level mental model and distinguishes `Listeners = Automatic/event-triggered` from `Actions = User-triggered/on-demand`, but the actual runtime capabilities are much broader than that model. `AIClient.defaultOptions.allowedTools` includes `Bash`, `Edit`, `Write`, `Task`, `WebFetch`, and `WebSearch`; `custom-tools.ts` returns untrusted email `body` content into the agent context; and listener handlers receive email mutation functions without an explicit permissions explanation. Users can therefore underestimate what the agent can execute and what conditions govern automation.

Recommendation

Generate a capability and permission manifest from the actual runtime configuration: disclose email read/search, file write, shell/network tools, listener automation, mutation permissions, and approval requirements in the product surface before delegation starts.

P0

needs changesproduction blocker60/100

Represent delegated work as a system, not merely as a conversation

The code represents parts of the system as managers (`ListenersManager`, `ActionsManager`, `ComponentManager`, `UIStateManager`) rather than only as chat, but delegated work is not modeled as a coherent execution system. `AIClient.queryStream` yields SDK messages, `ActionsManager.instances` and `ComponentManager.instances` are in-memory maps, listener invocations are per-event loops, and logs are separate JSONL files. There is no run graph, task timeline, dependency model, or shared orchestration record tying an agent session to listener-created actions, component state, email reads, and email mutations.

Recommendation

Create a persisted run graph with `run_id`, step IDs, dependencies, owners, statuses, tool calls, listener/action/component relationships, and outcomes; the conversation should be one view over this system, not the source of truth.

P0

needs changeshardening recommended45/100

Expose meaningful operational state, not internal complexity

Operational state is mixed with implementation details. `ListenerLogEntry` includes user-relevant fields such as `emailSubject`, `emailFrom`, `executed`, and `reason`, but action logs expose `instanceId`, `templateId`, `sessionId`, raw `params`, and `result` without a user-facing state model. The code does not define stable operational statuses such as `queued`, `active`, `awaiting_approval`, `blocked`, `complete`, or `failed`; instead, state is inferred from logs, console errors, in-memory maps such as `instances`, and SDK message streams.

Recommendation

Introduce a small persistent status enum for runs and actions, and keep IDs, template names, raw params, stack errors, and log file paths in a diagnostic view rather than making them the primary operational model.

P0

needs changeshardening recommended35/100

Align feedback with the user’s level of attention

Feedback is not calibrated to user attention. `notify()` accepts `options?.priority` but only forwards a generic `listener_notification`; listener execution always emits the same log shape regardless of risk, uncertainty, or required intervention. Most failures are written to `console.error` in `loadAllListeners`, `loadListener`, `watchListeners`, `loadTemplate`, and tool handlers rather than being escalated as user-actionable states. There is no distinction between foreground monitoring, passive background operation, and absent-user escalation.

Recommendation

Add an attention model to the execution layer: routine completions should remain low-noise, while risky actions, failures, missing credentials, blocked tools, and approval-required states should be escalated through a durable notification/status channel.

P0

needs changeshardening recommended35/100

Apply progressive disclosure to system agency

There are some disclosure building blocks, such as `search_inbox` returning a summary plus `logFilePath` and listener/action JSONL logs preserving details, but the layer does not consistently separate primary outcome from diagnostic detail. `read_emails` returns full email bodies directly to the agent, `logBroadcastCallback` receives raw `actions` and `error`, and there is no structured user-facing summary/detail boundary for tool usage, decision logic, or confidence-sensitive inspection.

Recommendation

Separate the primary user-facing status/outcome view from diagnostic inspection: persist detailed tool/email evidence, but expose concise intent, status, outcome, and risk by default with explicit expansion into logs, prompts, tool calls, and source emails.

Adversarial-surface findings

1 principle where the review engaged with specific adversarial mechanisms (prompt injection, role spoofing, encoding bypass, tool backdoor, approval bypass).

Prompt injection

P0

high riskproduction blocker95/100

Make hand-offs, approvals, and blockers explicit

Approval and blocker handling are the critical failing boundary. Automatic listener handlers receive `archiveEmail`, `starEmail`, `markAsRead`, and `addLabel` in `createContext()` and can mutate mailbox state without a hard `awaiting_approval` break. `AIClient.defaultOptions.allowedTools` permits `Bash`, `Edit`, `Write`, `WebFetch`, and `Task`; the `PreToolUse` hook only blocks `.js`/`.ts` writes outside `agent/custom_scripts`, so other writes, shell commands, and network actions can proceed. `custom-tools.ts` injects untrusted email bodies into the agent context, creating a prompt-injection path from external email content to powerful tools. Blockers are mostly `console.error` or text error…

Recommendation

Insert hard approval/state breaks before every externally visible or destructive side effect, including email mutations, file writes, shell commands, and network actions. Treat email bodies as untrusted input, require policy evaluation before tool use, and persist blocked/awaiting-approval states with the specific user action needed.

Embed in your README

Two embeddable variants: a small flat shield and a richer score card.

Score card (recommended)

Blueprint Readiness Score card
[![Blueprint Readiness Score card](https://aidesignblueprint.com/api/badge/run/056929ab-c0d5-40c8-be74-5d816128a389/card.svg)](https://aidesignblueprint.com/en/readiness-review/056929ab-c0d5-40c8-be74-5d816128a389)

Flat badge

Blueprint Readiness Score badge
[![Blueprint Readiness Score](https://aidesignblueprint.com/api/badge/run/056929ab-c0d5-40c8-be74-5d816128a389.svg)](https://aidesignblueprint.com/en/readiness-review/056929ab-c0d5-40c8-be74-5d816128a389)
Baseline and iteration details
Rubric: 2026-05-04

Run ID: 056929ab-c0d5-40c8-be74-5d816128a389 · Results expire after 90 days

Run by agents. Governed by humans. Validated by the AI Design Blueprint.