Skip to main contentSkip to footer
Application GuideSecurity application

Making trust load-bearing against adversarial pressure

The blueprint's principles of progressive disclosure, inspectability, and explicit approvals define the conditions under which agentic systems can be trusted. Security application means making those conditions load-bearing — not merely aspirational for cooperative users, but structurally enforced against active adversarial pressure.

Key Facts

Active attack vectors
Memory injection · Backdoor poisoning · Experience grafting
Core insight
No single control is sufficient — layered defence across four boundaries
Load-bearing principles
Principles 4, 7, and 8
Critical boundary
Action boundary — confirmation gating at tool-call level, not task initiation

The active threat model

Research has documented three concrete attack vectors with measured success rates. Memory injection achieves above 95% injection success through normal user queries. Backdoor poisoning corrupts retrieval stores using optimised trigger tokens. Experience grafting plants fabricated precedents that agents retrieve and replicate. These are not theoretical — they operate against systems that treat external data as trusted.

What is memory injection?

Attackers poison persistent memory through normal user queries — no special access required. The injected memory is then retrieved and acted upon by the agent as if it were legitimate context. Systems that store retrieved content directly into memory without validation are the most exposed.

Validate external inputs against defined schemas before storing
Apply TTL-based expiration to long-lived memory entries
Enforce per-user memory isolation at the storage level
Flag content with instruction-like patterns before acting on it
What is experience grafting?

Fabricated 'successful experiences' are planted in memory stores and retrieved by the agent as legitimate precedent. The agent then repeats the fabricated behaviour, believing it to be an established pattern. This vector exploits retrieval-augmented systems that privilege historical experience.

Use cryptographic integrity checks on persisted context
Apply trust scoring to retrieved experiences before acting on them
Audit memory stores periodically for anomalous entries
Prefer structured records over free-form narrative in memory
What does a layered defence require?

Four boundaries must be enforced independently. Input boundary: validate all external data before it reaches the reasoning layer. Storage boundary: isolate, expire, and integrity-check persisted context. Access boundary: enforce minimal permission footprint acquired at task time. Action boundary: gate irreversible actions at the tool-call level, not only at task initiation.

No boundary alone is sufficient — all four must be active
Injected instructions must not be able to bypass confirmation gates
Permissions should be acquired at task time and expired after use
Treat tool responses as external data — schema-validate before reasoning
What does auditability require?

Principle 7 (inspectability) and Principle 4 (progressive disclosure) together require that the active permission set, tool invocations, and reasoning inputs are logged in a queryable format at task time. An agent that cannot be reconstructed after an incident cannot be governed.

Log the active permission set at the start of each task
Emit structured audit events for every tool call with full context
Preserve logs for a retention period consistent with your regulatory environment
Design logs to be queryable and replay-capable, not append-only text streams
Browse inspectability examplesRead Principle 8 — explicit approvals