Model context

The deployment decision that changes everything

Choosing where a model runs is an infrastructure and governance decision, not a performance optimisation. It changes your privacy posture, your reliability obligations, your compliance evidence trail, and how much your orchestration layer needs to compensate for capability gaps.

Key Facts

Deployment modes: Local / open-source · Self-hosted · Managed API
Dimensions compared: 12 design and governance dimensions
Applies to: All 10 Blueprint principles
Primary question: Where does the data go, and what happens when the model fails?

The decision, not the hype

Model choice for agentic systems is about control, privacy, and what breaks, not about benchmark scores. The relevant questions are: Where does the data go? Who can observe the inference? What happens when the model or the provider is unavailable? What compliance evidence can you produce?

Deployment mode comparison

Data boundary

Local / open-source

Data never leaves device or org

Self-hosted

Data stays on org infrastructure

Managed API

Data is processed by provider

Privacy posture

Local / open-source

Strongest, no external exposure

Self-hosted

Strong, depends on infrastructure controls

Managed API

Requires provider trust and DPA

Latency

Local / open-source

Variable, hardware-dependent

Self-hosted

Controlled, predictable on known infrastructure

Managed API

Provider-dependent, varies by region and load

Cost behaviour

Local / open-source

Fixed compute cost

Self-hosted

Fixed compute plus operational overhead

Managed API

Variable, billed per token

Tool calling

Local / open-source

Limited in most open models

Self-hosted

Model-dependent

Managed API

Strong in frontier models

Context window

Local / open-source

Often smaller

Self-hosted

Model-dependent

Managed API

Largest available

Reliability tier

Local / open-source

Experimental to production-capable

Self-hosted

Production-capable

Managed API

Production to enterprise-governed

Offline capability

Local / open-source

Full

Self-hosted

Partial

Managed API

None

Vendor dependency

Local / open-source

None on model weights

Self-hosted

Infrastructure dependency only

Managed API

High, provider availability is a dependency

Compliance evidence

Local / open-source

Org-controlled audit trail

Self-hosted

Org-controlled audit trail

Managed API

Provider attestation required

Review burden

Local / open-source

Higher, no provider audit trail

Self-hosted

Medium

Managed API

Lower, provider handles infrastructure audit

Orchestration needs

Local / open-source

Higher, compensates for capability gaps

Self-hosted

Medium

Managed API

Lower, offloads reasoning complexity

Dimension	Local / open-source	Self-hosted	Managed API
Data boundary	Data never leaves device or org	Data stays on org infrastructure	Data is processed by provider
Privacy posture	Strongest, no external exposure	Strong, depends on infrastructure controls	Requires provider trust and DPA
Latency	Variable, hardware-dependent	Controlled, predictable on known infrastructure	Provider-dependent, varies by region and load
Cost behaviour	Fixed compute cost	Fixed compute plus operational overhead	Variable, billed per token
Tool calling	Limited in most open models	Model-dependent	Strong in frontier models
Context window	Often smaller	Model-dependent	Largest available
Reliability tier	Experimental to production-capable	Production-capable	Production to enterprise-governed
Offline capability	Full	Partial	None
Vendor dependency	None on model weights	Infrastructure dependency only	High, provider availability is a dependency
Compliance evidence	Org-controlled audit trail	Org-controlled audit trail	Provider attestation required
Review burden	Higher, no provider audit trail	Medium	Lower, provider handles infrastructure audit
Orchestration needs	Higher, compensates for capability gaps	Medium	Lower, offloads reasoning complexity

Design consequences for agentic systems

Tool calling with real-world side effects

Is more dangerous with weaker models that may hallucinate tool parameters. When using local or experimental models, orchestration guardrails must compensate for the reduced reliability of tool parameter generation.

Add explicit confirmation surfaces before each mutation

Validate tool parameters structurally before execution

Set retry limits to prevent silent compounding failures

Log every tool call with parameters for post-hoc review

Long-running background agents

Using managed APIs inherit provider outages. Fallback patterns, resumption checkpoints, and explicit timeout handling are not optional when the model is a remote service that can become unavailable mid-run.

Checkpoint agent state at logical boundaries

Design explicit fallback routes for provider unavailability

Surface interruption to the user rather than silently retrying

Test resumption paths as first-class scenarios

Human-in-the-loop checkpoints

Are more load-bearing when model quality is lower or unpredictable. The same principle requires heavier implementation, more frequent approval gates, more explicit uncertainty surfacing, when the model may produce lower-quality outputs.

Increase approval gate frequency proportionally to model uncertainty

Surface explicit confidence signals rather than hiding them

Make the quality floor of each model visible to reviewers

Do not reduce checkpoint frequency as a performance optimisation

Observability

Is harder with local models because you cannot inspect provider telemetry. You own the full observation stack. Instrumentation, logging, and alert design are your responsibility, not your provider's.

Instrument at every model call boundary

Log structured inputs and outputs for post-hoc analysis

Design alert thresholds before the model goes into production

Own the full observation stack, there is no provider fallback

The governance gradient

Organisations move across this spectrum. A team might start on managed API, move to hybrid for cost and privacy, and eventually self-host for regulated workloads. The Blueprint principles apply at every point, but what each principle requires of your implementation shifts with the deployment posture.

Use the model context filter in the example library to find patterns appropriate for your current deployment mode

Model-agnostic patterns work regardless of where inference runs, prioritise these for portability

Local-viable patterns work without managed API dependencies, prioritise these for privacy-first and regulated contexts

Browse model-agnostic examples Browse local-viable examples