Vai al contenuto principaleVai al footer
Revisionato

Valutazione completata; in attesa di revisione delle prove.

Agent Architecture Review, Snapshot di validazione

Valutato il 12 maggio 2026 rispetto alla doctrine di AI Design Blueprint

Alto rischio

Stato: Alto rischio

22/100

Voto F

0 allineati7 blocker produzione3 hardening1 alto rischio
Verdetti per principio

The submitted SDK glue layer is an autonomous agentic workflow: it streams Claude Agent SDK sessions with broad tools, executes event-driven listener scripts, exposes email mutation helpers, and manages action/component/state extensions. It has useful local-demo primitives such as listener/action logs and UI state persistence, but the production trust boundaries do not hold: untrusted email bodies are fed to an agent with Bash/Edit/Write/Web tools, automated listeners can mutate email state without an approval break, and long-running work lacks persistent run lifecycle, steering, and auditable correlation.

Findings per principio

10 principi valutati. Verdict, severity, evidenza e raccomandazione per ognuno.

P0

Alto rischioBlocker di produzione95/100

Make hand-offs, approvals, and blockers explicit

Approval and blocker handling are the critical failing boundary. Automatic listener handlers receive `archiveEmail`, `starEmail`, `markAsRead`, and `addLabel` in `createContext()` and can mutate mailbox state without a hard `awaiting_approval` break. `AIClient.defaultOptions.allowedTools` permits `Bash`, `Edit`, `Write`, `WebFetch`, and `Task`; the `PreToolUse` hook only blocks `.js`/`.ts` writes outside `agent/custom_scripts`, so other writes, shell commands, and network actions can proceed. `custom-tools.ts` injects untrusted email bodies into the agent context, creating a prompt-injection path from external email content to powerful tools. Blockers are mostly `console.error` or text error…

Raccomandazione

Insert hard approval/state breaks before every externally visible or destructive side effect, including email mutations, file writes, shell commands, and network actions. Treat email bodies as untrusted input, require policy evaluation before tool use, and persist blocked/awaiting-approval states with the specific user action needed.

P0

Richiede modificheBlocker di produzione75/100

Optimise for steering, not only initiating

The workflow can be initiated and streamed, but it is not steerable while underway. `AIClient.queryStream` passes options such as `maxTurns` and `resume`, but there is no abort controller, pause/resume command, dynamic constraint update, or side-effect checkpoint in this layer. `ListenersManager.checkEvent` runs matching listeners to completion once triggered, and email mutation helpers do not check for cancellation or changed policy before acting. `ActionsManager.executeAction` executes a handler once and returns a result; retry, rollback, reprioritisation, and mid-run correction are not represented.

Raccomandazione

Add a persistent execution controller with pause, resume, abort, retry, and rollback transitions that are checked before each tool call or email mutation; steering should update the active run’s policy and state rather than forcing the user to restart.

P0

Richiede modificheBlocker di produzione70/100

Establish trust through inspectability

The code has partial inspectability through `LogWriter.appendLog`, `ActionsManager.logExecution`, `UIStateManager.logStateUpdate`, and `search_inbox` log files, but the audit boundary is insufficient for accountable autonomous work. There is no persistent `run_id` correlating `AIClient.queryStream` messages, listener invocations, email mutations, UI state updates, and action executions. `callAgent()` does not record the prompt, model, schema, response, or tool-use rationale; listener log writes are fire-and-forget; and JSONL files are not tamper-evident. `custom-tools.ts` also writes full formatted email results, including `body`, to local log files, which improves traceability but creates a…

Raccomandazione

Move audit authority out of the execution loop into an append-only ledger keyed by `run_id` and step IDs; record prompts, model/version, tool calls, source email IDs, approval decisions, mutations, result diffs, and sanitized evidence with tamper-evident integrity checks.

P0

Richiede modificheBlocker di produzione65/100

Design for delegation rather than direct manipulation

The code supports delegation mechanically through `ListenersManager.checkEvent`, `ActionsManager.executeAction`, and the prompt’s listener/action creation model, but delegated authority is not represented as a governed scope. `createContext()` hands every enabled listener helpers such as `archiveEmail`, `starEmail`, `markAsRead`, `addLabel`, `callAgent`, and `uiState.set` without a persistent authority envelope, per-listener permission scope, or lifecycle controls beyond `config.enabled`. `AIClient.defaultOptions.allowedTools` also grants broad tools including `Task`, `Bash`, `Edit`, `Write`, `WebFetch`, and `WebSearch` without tying them to user-stated constraints.

Raccomandazione

Move delegated authority into a persistent policy/permission layer: each listener, action, and agent run should have an explicit scope, allowed side effects, expiry, owner, and pause/stop controls before execution context tools are exposed.

P0

Richiede modificheBlocker di produzione60/100

Ensure that background work remains perceptible

Background work is only partially perceptible. `ListenersManager` writes a `ListenerLogEntry` after a handler finishes and optionally calls `logBroadcastCallback`, while `ActionsManager.logExecution` appends JSONL logs, but there is no persistent run record showing `queued`, `active`, `blocked`, `awaiting approval`, or `failed` while work is in progress. `this.logWriter.appendLog(...).catch(...)` is fire-and-forget, so audit/status failures are not part of task state. `MessageQueue.close()` sets `closed = true` and drops `resolvers` without resolving pending `next()` calls, which can strand waiters silently.

Raccomandazione

Represent every listener/action/agent invocation as a durable run record with a status lifecycle and progress timestamps; make log-write failures part of that lifecycle, and change `MessageQueue.close()` to resolve pending waiters with `{ done: true }` or an explicit abort result.

P0

Richiede modificheBlocker di produzione60/100

Replace implied magic with clear mental models

The prompt in `EMAIL_AGENT_PROMPT` gives a helpful high-level mental model and distinguishes `Listeners = Automatic/event-triggered` from `Actions = User-triggered/on-demand`, but the actual runtime capabilities are much broader than that model. `AIClient.defaultOptions.allowedTools` includes `Bash`, `Edit`, `Write`, `Task`, `WebFetch`, and `WebSearch`; `custom-tools.ts` returns untrusted email `body` content into the agent context; and listener handlers receive email mutation functions without an explicit permissions explanation. Users can therefore underestimate what the agent can execute and what conditions govern automation.

Raccomandazione

Generate a capability and permission manifest from the actual runtime configuration: disclose email read/search, file write, shell/network tools, listener automation, mutation permissions, and approval requirements in the product surface before delegation starts.

P0

Richiede modificheBlocker di produzione60/100

Represent delegated work as a system, not merely as a conversation

The code represents parts of the system as managers (`ListenersManager`, `ActionsManager`, `ComponentManager`, `UIStateManager`) rather than only as chat, but delegated work is not modeled as a coherent execution system. `AIClient.queryStream` yields SDK messages, `ActionsManager.instances` and `ComponentManager.instances` are in-memory maps, listener invocations are per-event loops, and logs are separate JSONL files. There is no run graph, task timeline, dependency model, or shared orchestration record tying an agent session to listener-created actions, component state, email reads, and email mutations.

Raccomandazione

Create a persisted run graph with `run_id`, step IDs, dependencies, owners, statuses, tool calls, listener/action/component relationships, and outcomes; the conversation should be one view over this system, not the source of truth.

P0

Richiede modificheHardening consigliato45/100

Expose meaningful operational state, not internal complexity

Operational state is mixed with implementation details. `ListenerLogEntry` includes user-relevant fields such as `emailSubject`, `emailFrom`, `executed`, and `reason`, but action logs expose `instanceId`, `templateId`, `sessionId`, raw `params`, and `result` without a user-facing state model. The code does not define stable operational statuses such as `queued`, `active`, `awaiting_approval`, `blocked`, `complete`, or `failed`; instead, state is inferred from logs, console errors, in-memory maps such as `instances`, and SDK message streams.

Raccomandazione

Introduce a small persistent status enum for runs and actions, and keep IDs, template names, raw params, stack errors, and log file paths in a diagnostic view rather than making them the primary operational model.

P0

Richiede modificheHardening consigliato35/100

Align feedback with the user’s level of attention

Feedback is not calibrated to user attention. `notify()` accepts `options?.priority` but only forwards a generic `listener_notification`; listener execution always emits the same log shape regardless of risk, uncertainty, or required intervention. Most failures are written to `console.error` in `loadAllListeners`, `loadListener`, `watchListeners`, `loadTemplate`, and tool handlers rather than being escalated as user-actionable states. There is no distinction between foreground monitoring, passive background operation, and absent-user escalation.

Raccomandazione

Add an attention model to the execution layer: routine completions should remain low-noise, while risky actions, failures, missing credentials, blocked tools, and approval-required states should be escalated through a durable notification/status channel.

P0

Richiede modificheHardening consigliato35/100

Apply progressive disclosure to system agency

There are some disclosure building blocks, such as `search_inbox` returning a summary plus `logFilePath` and listener/action JSONL logs preserving details, but the layer does not consistently separate primary outcome from diagnostic detail. `read_emails` returns full email bodies directly to the agent, `logBroadcastCallback` receives raw `actions` and `error`, and there is no structured user-facing summary/detail boundary for tool usage, decision logic, or confidence-sensitive inspection.

Raccomandazione

Separate the primary user-facing status/outcome view from diagnostic inspection: persist detailed tool/email evidence, but expose concise intent, status, outcome, and risk by default with explicit expansion into logs, prompts, tool calls, and source emails.

Findings sulla superficie avversaria

1 principi dove la review ha ingaggiato meccanismi avversari specifici (prompt injection, role spoofing, encoding bypass, tool backdoor, approval bypass).

Prompt injection

P0

Alto rischioBlocker di produzione95/100

Make hand-offs, approvals, and blockers explicit

Approval and blocker handling are the critical failing boundary. Automatic listener handlers receive `archiveEmail`, `starEmail`, `markAsRead`, and `addLabel` in `createContext()` and can mutate mailbox state without a hard `awaiting_approval` break. `AIClient.defaultOptions.allowedTools` permits `Bash`, `Edit`, `Write`, `WebFetch`, and `Task`; the `PreToolUse` hook only blocks `.js`/`.ts` writes outside `agent/custom_scripts`, so other writes, shell commands, and network actions can proceed. `custom-tools.ts` injects untrusted email bodies into the agent context, creating a prompt-injection path from external email content to powerful tools. Blockers are mostly `console.error` or text error…

Raccomandazione

Insert hard approval/state breaks before every externally visible or destructive side effect, including email mutations, file writes, shell commands, and network actions. Treat email bodies as untrusted input, require policy evaluation before tool use, and persist blocked/awaiting-approval states with the specific user action needed.

Aggiungi al tuo README

Due varianti embeddabili: una piccola e una a card più ricca.

Score card (consigliata)

Blueprint Readiness Score card
[![Blueprint Readiness Score card](https://aidesignblueprint.com/api/badge/run/056929ab-c0d5-40c8-be74-5d816128a389/card.svg)](https://aidesignblueprint.com/en/readiness-review/056929ab-c0d5-40c8-be74-5d816128a389)

Badge piatto

Blueprint Readiness Score badge
[![Blueprint Readiness Score](https://aidesignblueprint.com/api/badge/run/056929ab-c0d5-40c8-be74-5d816128a389.svg)](https://aidesignblueprint.com/en/readiness-review/056929ab-c0d5-40c8-be74-5d816128a389)
Dettagli baseline e iterazione
Rubric: 2026-05-04

Run ID: 056929ab-c0d5-40c8-be74-5d816128a389 · Results expire after 90 days

Run by agents. Governed by humans. Validated by the AI Design Blueprint.