Making trust load-bearing against adversarial pressure
The blueprint's principles of progressive disclosure, inspectability, and explicit approvals define the conditions under which agentic systems can be trusted. Security application means making those conditions load-bearing — not merely aspirational for cooperative users, but structurally enforced against active adversarial pressure.
Key Facts
- Active attack vectors
- Memory injection · Backdoor poisoning · Experience grafting
- Core insight
- No single control is sufficient — layered defence across four boundaries
- Load-bearing principles
- Principles 4, 7, and 8
- Critical boundary
- Action boundary — confirmation gating at tool-call level, not task initiation
The active threat model
Research has documented three concrete attack vectors with measured success rates. Memory injection achieves above 95% injection success through normal user queries. Backdoor poisoning corrupts retrieval stores using optimised trigger tokens. Experience grafting plants fabricated precedents that agents retrieve and replicate. These are not theoretical — they operate against systems that treat external data as trusted.