Skip to main contentSkip to footer
ExamplescriptadvancedRunnableagent-harness

Level 4: Agent Harness — Full Runtime Access

Give the agent a full runtime via the agent runtime SDK. It can search files, read docs, and reason through problems autonomously.

Key Facts

Level
advanced
Runtime
Python • Pydantic + Python Dotenv
Pattern
Runtime-backed investigation with explicit review artifacts
Interaction
Live sandbox • Script
Updated
14 March 2026

Navigate this example

High-level flow

How this example moves from input to execution and reviewable output
Level 4: Agent Harness —… -> Run the agent task -> Investigation scope -> Runtime activity -> Structured handoff -> Knowledge access and external…

Trigger

Level 4: Agent Harness —…

Runtime

Run the agent task

Outcome

Investigation scope

Why this page exists

This example is shown as both real source code and a product-facing interaction pattern so learners can connect implementation, UX, and doctrine without leaving the library.

Visual flowReal sourceSandbox or walkthroughMCP access
How should this example be used in the platform?

Use the sandbox to understand the experience pattern first, then inspect the source to see how the product boundary, model boundary, and doctrine boundary are actually implemented.

UX pattern: Runtime-backed investigation with explicit review artifacts
Knowledge access and external tools in one runtime
Explicit permission and budget settings
Structured output keeps a wide-capability agent reviewable
Source references
Library entry
agents-agent-complexity-4-agent-harness
Source path
content/example-library/sources/agents/agent-complexity/4-agent-harness.py
Libraries
pydantic, python-dotenv
Runtime requirements
Local repo environment
Related principles
Design for delegation rather than direct manipulation, Replace implied magic with clear mental models, Represent delegated work as a system, not merely as a conversation, Optimise for steering, not only initiating

Model context

Model-agnosticLocal-viableWrapped tool calling acceptableMedium reasoning requirementOrchestration compensates

The harness pattern adds structural guardrails that compensate for lower model quality. Local models are viable when the harness validates outputs before acting.

4-agent-harness.py

python
"""
Level 4: Agent Harness — Full Runtime Access
Give the agent a full runtime via the Claude Agent SDK.
It can search files, read docs, and reason through problems autonomously.

NOTE: Run with `python 4-agent-harness.py` (not IPython/Jupyter).
The Claude Agent SDK uses anyio TaskGroups incompatible with nest_asyncio.

https://platform.claude.com/docs/en/agent-sdk/python
"""

import asyncio
import json
from pathlib import Path

from pydantic import BaseModel
from claude_agent_sdk import (
    AssistantMessage,
    ClaudeAgentOptions,
    ClaudeSDKClient,
    ResultMessage,
    TextBlock,
    ToolUseBlock,
    tool,
    create_sdk_mcp_server,
)
from dotenv import load_dotenv

load_dotenv()

KNOWLEDGE_DIR = Path(__file__).parent / "knowledge"


@tool(
    "check_payment_gateway",
    "Check payment processor for transaction status and refund eligibility",
    {"transaction_date": str, "amount": str},
)
async def check_payment_gateway(args):
    return {
        "content": [
            {
                "type": "text",
                "text": (
                    f"Payment Gateway Response for {args['transaction_date']} — ${args['amount']}:\n"
                    "- Transaction ID: txn_8f3k2j1\n"
                    "- Status: SETTLED\n"
                    "- Refund eligible: YES\n"
                    "- Original payment method: Visa ending in 4242\n"
                    "- Settlement date: 2025-02-02"
                ),
            }
        ]
    }


@tool(
    "issue_refund",
    "Process a refund through the payment gateway",
    {"amount": str, "reason": str, "customer_id": str},
)
async def issue_refund(args):
    return {
        "content": [
            {
                "type": "text",
                "text": (
                    f"Refund processed successfully:\n"
                    f"- Customer: {args['customer_id']}\n"
                    f"- Amount: ${args['amount']}\n"
                    f"- Reason: {args['reason']}\n"
                    "- Refund ID: ref_9x2m4p7\n"
                    "- ETA: 3-5 business days"
                ),
            }
        ]
    }


class CustomerEmail(BaseModel):
    subject: str
    body: str


class HarnessOutput(BaseModel):
    action_taken: str
    refund_amount: float
    refund_id: str
    policy_compliant: bool
    customer_email: CustomerEmail


async def run_harness(task: str):
    server = create_sdk_mcp_server(
        name="billing-api",
        version="1.0.0",
        tools=[check_payment_gateway, issue_refund],
    )

    options = ClaudeAgentOptions(
        system_prompt=(
            "You are a senior support analyst with access to:\n\n"
            f"1. A knowledge base at: {KNOWLEDGE_DIR}\n"
            "   - policies/ — refund policy, escalation matrix, subscription management\n"
            "   - customers/ — customer profiles with transaction history\n"
            "   - templates/ — response templates\n\n"
            "2. External billing API tools:\n"
            "   - check_payment_gateway — verify transaction status\n"
            "   - issue_refund — process refunds\n\n"
            "Investigate issues by reading the relevant files AND calling the billing API. "
            "Always verify the transaction with the payment gateway before issuing a refund.\n"
            "Think step by step about what information you need before acting."
        ),
        allowed_tools=[
            "Read",
            "Glob",
            "Grep",
            "mcp__billing-api__check_payment_gateway",
            "mcp__billing-api__issue_refund",
        ],
        mcp_servers={"billing-api": server},
        output_format={
            "type": "json_schema",
            "schema": HarnessOutput.model_json_schema(),
        },
        permission_mode="acceptEdits",
        max_turns=15,
        max_budget_usd=1.00,
        model="sonnet",
        cwd=str(KNOWLEDGE_DIR),
    )

    async with ClaudeSDKClient(options=options) as client:
        await client.query(task)
        async for message in client.receive_response():
            if isinstance(message, AssistantMessage):
                for block in message.content:
                    if isinstance(block, TextBlock):
                        print(block.text)
                    elif isinstance(block, ToolUseBlock):
                        print(f"\n  [Tool] {block.name}({block.input})")
            elif isinstance(message, ResultMessage):
                cost = (
                    f"${message.total_cost_usd:.4f}"
                    if message.total_cost_usd
                    else "n/a"
                )
                print(f"\n--- Done in {message.num_turns} turns | cost: {cost} ---")
                if message.structured_output:
                    raw = (
                        json.loads(message.structured_output)
                        if isinstance(message.structured_output, str)
                        else message.structured_output
                    )
                    output = HarnessOutput.model_validate(raw)
                    print("\nStructured output:")
                    print(output.model_dump_json(indent=2))


if __name__ == "__main__":
    asyncio.run(
        run_harness(
            "Customer cust_12345 reports a duplicate charge on their February bill. "
            "Investigate using the knowledge base, determine the right action per policy, "
            "and draft a personalized response using the appropriate template."
        )
    )
What should the learner inspect in the code?

Look for the exact place where system scope is bounded: schema definitions, prompt framing, runtime configuration, and the call site that turns user intent into a concrete model or workflow action.

create_sdk_mcp_server(
allowed_tools=[
output_format={
async with ClaudeSDKClient
How does the sandbox relate to the source?

The sandbox should make the UX legible: what the user sees, what the system is deciding, and how the result becomes reviewable. The source then shows how that behavior is actually implemented.

Launch an investigation task.
Inspect the runtime trace across knowledge files and billing tools.
Review the final structured output and drafted customer email.
SandboxRuntime-backed investigation with explicit review artifacts
Full runtime investigation surface

This simulation shows what changes when an agent can read files, search internal knowledge, and call external tools inside a full runtime harness.

UX explanation

The experience should show that the system is not merely answering from a prompt. It is investigating across multiple resources, so the user needs durable visibility into what was searched, what was verified, and what action was finally taken.

AI design explanation

A harness grants much broader capability than tool-calling alone. That makes inspectability, permission boundaries, and structured output even more important, because the system can now move across a real working environment.

Interaction walkthrough

  1. 1Launch an investigation task.
  2. 2Inspect the runtime trace across knowledge files and billing tools.
  3. 3Review the final structured output and drafted customer email.

Runtime task

Customer cust_12345 reports a duplicate charge on their February bill. Investigate and draft the right response.

Knowledge filesMCP billing toolsStructured output

Runtime trace

The harness trace should expose both knowledge lookup and external tool use.

Structured handoff

A wide-capability agent still needs a final reviewable package for the product to present.

Why this needs stronger UX

  • Knowledge access and external tools in one runtime
  • Explicit permission and budget settings
  • Structured output keeps a wide-capability agent reviewable
Used in courses and paths

This example currently stands on its own in the library, but it still connects to the principle system and the broader example family.

Related principles

Runtime architecture

Use this example in your agents

This example is also available through the blueprint’s agent-ready layer. Use the For agents page for the public MCP, deterministic exports, and Claude/Cursor setup.

Define triggers, context, and boundaries before increasing autonomy
Make control, observability, and recovery explicit in the runtime
Choose the right operational patterns before delegating to workflows