Skip to main contentSkip to footer
Model context

The deployment decision that changes everything

Choosing where a model runs is an infrastructure and governance decision, not a performance optimisation. It changes your privacy posture, your reliability obligations, your compliance evidence trail, and how much your orchestration layer needs to compensate for capability gaps.

Key Facts

Deployment modes
Local / open-source · Self-hosted · Managed API
Dimensions compared
12 design and governance dimensions
Applies to
All 10 Blueprint principles
Primary question
Where does the data go, and what happens when the model fails?

The decision, not the hype

Model choice for agentic systems is about control, privacy, and what breaks — not about benchmark scores. The relevant questions are: Where does the data go? Who can observe the inference? What happens when the model or the provider is unavailable? What compliance evidence can you produce?

Deployment mode comparison

Data boundary

Local / open-source

Data never leaves device or org

Self-hosted

Data stays on org infrastructure

Managed API

Data is processed by provider

Privacy posture

Local / open-source

Strongest — no external exposure

Self-hosted

Strong — depends on infrastructure controls

Managed API

Requires provider trust and DPA

Latency

Local / open-source

Variable — hardware-dependent

Self-hosted

Controlled — predictable on known infrastructure

Managed API

Provider-dependent — varies by region and load

Cost behaviour

Local / open-source

Fixed compute cost

Self-hosted

Fixed compute plus operational overhead

Managed API

Variable — billed per token

Tool calling

Local / open-source

Limited in most open models

Self-hosted

Model-dependent

Managed API

Strong in frontier models

Context window

Local / open-source

Often smaller

Self-hosted

Model-dependent

Managed API

Largest available

Reliability tier

Local / open-source

Experimental to production-capable

Self-hosted

Production-capable

Managed API

Production to enterprise-governed

Offline capability

Local / open-source

Full

Self-hosted

Partial

Managed API

None

Vendor dependency

Local / open-source

None on model weights

Self-hosted

Infrastructure dependency only

Managed API

High — provider availability is a dependency

Compliance evidence

Local / open-source

Org-controlled audit trail

Self-hosted

Org-controlled audit trail

Managed API

Provider attestation required

Review burden

Local / open-source

Higher — no provider audit trail

Self-hosted

Medium

Managed API

Lower — provider handles infrastructure audit

Orchestration needs

Local / open-source

Higher — compensates for capability gaps

Self-hosted

Medium

Managed API

Lower — offloads reasoning complexity

Design consequences for agentic systems

Tool calling with real-world side effects

Is more dangerous with weaker models that may hallucinate tool parameters. When using local or experimental models, orchestration guardrails must compensate for the reduced reliability of tool parameter generation.

Add explicit confirmation surfaces before each mutation
Validate tool parameters structurally before execution
Set retry limits to prevent silent compounding failures
Log every tool call with parameters for post-hoc review
Long-running background agents

Using managed APIs inherit provider outages. Fallback patterns, resumption checkpoints, and explicit timeout handling are not optional when the model is a remote service that can become unavailable mid-run.

Checkpoint agent state at logical boundaries
Design explicit fallback routes for provider unavailability
Surface interruption to the user rather than silently retrying
Test resumption paths as first-class scenarios
Human-in-the-loop checkpoints

Are more load-bearing when model quality is lower or unpredictable. The same principle requires heavier implementation — more frequent approval gates, more explicit uncertainty surfacing — when the model may produce lower-quality outputs.

Increase approval gate frequency proportionally to model uncertainty
Surface explicit confidence signals rather than hiding them
Make the quality floor of each model visible to reviewers
Do not reduce checkpoint frequency as a performance optimisation
Observability

Is harder with local models because you cannot inspect provider telemetry. You own the full observation stack. Instrumentation, logging, and alert design are your responsibility, not your provider's.

Instrument at every model call boundary
Log structured inputs and outputs for post-hoc analysis
Design alert thresholds before the model goes into production
Own the full observation stack — there is no provider fallback
The governance gradient

Organisations move across this spectrum. A team might start on managed API, move to hybrid for cost and privacy, and eventually self-host for regulated workloads. The Blueprint principles apply at every point — but what each principle requires of your implementation shifts with the deployment posture.

Use the model context filter in the example library to find patterns appropriate for your current deployment mode
Model-agnostic patterns work regardless of where inference runs — prioritise these for portability
Local-viable patterns work without managed API dependencies — prioritise these for privacy-first and regulated contexts
Browse model-agnostic examplesBrowse local-viable examples