Skip to main contentSkip to footer
EsempioscriptadvancedEseguibileagent-harness

Livello 4: Agent Harness — Accesso completo al runtime

Fornisci all’agente un runtime completo tramite l’agent runtime SDK. Può cercare file, leggere documenti e ragionare sui problemi in modo autonomo.

Fatti chiave

Livello
advanced
Runtime
Python • Pydantic + Python Dotenv
Pattern
Runtime-backed investigation with explicit review artifacts
Interazione
Sandbox live • Script
Aggiornato
14 marzo 2026

Naviga questo esempio

Vista rapida del flusso

Come questo esempio si muove tra input, esecuzione e risultato rivedibile
Livello 4: Agent Harness —… -> Run the agent task -> Investigation scope -> Runtime activity -> Structured handoff -> Knowledge access and external…

Trigger

Livello 4: Agent Harness —…

Runtime

Run the agent task

Esito

Investigation scope

Perché esiste questa pagina

Questo esempio è mostrato sia come codice sorgente reale che come pattern di interazione orientato al prodotto, così i discenti possono collegare implementazione, UX e dottrina senza lasciare la libreria.

Flusso visivoCodice realeSandbox o walkthroughAccesso MCP
Come dovrebbe essere usato questo esempio nella piattaforma?

Usa prima la sandbox per comprendere il pattern di esperienza, poi ispeziona il sorgente per vedere come il confine del prodotto, il confine del modello e il confine della dottrina sono effettivamente implementati.

UX pattern: Runtime-backed investigation with explicit review artifacts
Knowledge access and external tools in one runtime
Explicit permission and budget settings
Structured output keeps a wide-capability agent reviewable
Riferimenti sorgente
Voce di libreria
agents-agent-complexity-4-agent-harness
Percorso sorgente
content/example-library/sources/agents/agent-complexity/4-agent-harness.py
Librerie
pydantic, python-dotenv
Requisiti di runtime
Ambiente del repository locale
Principi correlati
Progettare per la delega piuttosto che per la manipolazione diretta, Sostituire la magia implicita con modelli mentali chiari, Rappresentare il lavoro delegato come un sistema, non solo come una conversazione, Ottimizzare per la guida, non solo per l'inizio

4-agent-harness.py

python
"""
Level 4: Agent Harness — Full Runtime Access
Give the agent a full runtime via the Claude Agent SDK.
It can search files, read docs, and reason through problems autonomously.

NOTE: Run with `python 4-agent-harness.py` (not IPython/Jupyter).
The Claude Agent SDK uses anyio TaskGroups incompatible with nest_asyncio.

https://platform.claude.com/docs/en/agent-sdk/python
"""

import asyncio
import json
from pathlib import Path

from pydantic import BaseModel
from claude_agent_sdk import (
    AssistantMessage,
    ClaudeAgentOptions,
    ClaudeSDKClient,
    ResultMessage,
    TextBlock,
    ToolUseBlock,
    tool,
    create_sdk_mcp_server,
)
from dotenv import load_dotenv

load_dotenv()

KNOWLEDGE_DIR = Path(__file__).parent / "knowledge"


@tool(
    "check_payment_gateway",
    "Check payment processor for transaction status and refund eligibility",
    {"transaction_date": str, "amount": str},
)
async def check_payment_gateway(args):
    return {
        "content": [
            {
                "type": "text",
                "text": (
                    f"Payment Gateway Response for {args['transaction_date']} — ${args['amount']}:\n"
                    "- Transaction ID: txn_8f3k2j1\n"
                    "- Status: SETTLED\n"
                    "- Refund eligible: YES\n"
                    "- Original payment method: Visa ending in 4242\n"
                    "- Settlement date: 2025-02-02"
                ),
            }
        ]
    }


@tool(
    "issue_refund",
    "Process a refund through the payment gateway",
    {"amount": str, "reason": str, "customer_id": str},
)
async def issue_refund(args):
    return {
        "content": [
            {
                "type": "text",
                "text": (
                    f"Refund processed successfully:\n"
                    f"- Customer: {args['customer_id']}\n"
                    f"- Amount: ${args['amount']}\n"
                    f"- Reason: {args['reason']}\n"
                    "- Refund ID: ref_9x2m4p7\n"
                    "- ETA: 3-5 business days"
                ),
            }
        ]
    }


class CustomerEmail(BaseModel):
    subject: str
    body: str


class HarnessOutput(BaseModel):
    action_taken: str
    refund_amount: float
    refund_id: str
    policy_compliant: bool
    customer_email: CustomerEmail


async def run_harness(task: str):
    server = create_sdk_mcp_server(
        name="billing-api",
        version="1.0.0",
        tools=[check_payment_gateway, issue_refund],
    )

    options = ClaudeAgentOptions(
        system_prompt=(
            "You are a senior support analyst with access to:\n\n"
            f"1. A knowledge base at: {KNOWLEDGE_DIR}\n"
            "   - policies/ — refund policy, escalation matrix, subscription management\n"
            "   - customers/ — customer profiles with transaction history\n"
            "   - templates/ — response templates\n\n"
            "2. External billing API tools:\n"
            "   - check_payment_gateway — verify transaction status\n"
            "   - issue_refund — process refunds\n\n"
            "Investigate issues by reading the relevant files AND calling the billing API. "
            "Always verify the transaction with the payment gateway before issuing a refund.\n"
            "Think step by step about what information you need before acting."
        ),
        allowed_tools=[
            "Read",
            "Glob",
            "Grep",
            "mcp__billing-api__check_payment_gateway",
            "mcp__billing-api__issue_refund",
        ],
        mcp_servers={"billing-api": server},
        output_format={
            "type": "json_schema",
            "schema": HarnessOutput.model_json_schema(),
        },
        permission_mode="acceptEdits",
        max_turns=15,
        max_budget_usd=1.00,
        model="sonnet",
        cwd=str(KNOWLEDGE_DIR),
    )

    async with ClaudeSDKClient(options=options) as client:
        await client.query(task)
        async for message in client.receive_response():
            if isinstance(message, AssistantMessage):
                for block in message.content:
                    if isinstance(block, TextBlock):
                        print(block.text)
                    elif isinstance(block, ToolUseBlock):
                        print(f"\n  [Tool] {block.name}({block.input})")
            elif isinstance(message, ResultMessage):
                cost = (
                    f"${message.total_cost_usd:.4f}"
                    if message.total_cost_usd
                    else "n/a"
                )
                print(f"\n--- Done in {message.num_turns} turns | cost: {cost} ---")
                if message.structured_output:
                    raw = (
                        json.loads(message.structured_output)
                        if isinstance(message.structured_output, str)
                        else message.structured_output
                    )
                    output = HarnessOutput.model_validate(raw)
                    print("\nStructured output:")
                    print(output.model_dump_json(indent=2))


if __name__ == "__main__":
    asyncio.run(
        run_harness(
            "Customer cust_12345 reports a duplicate charge on their February bill. "
            "Investigate using the knowledge base, determine the right action per policy, "
            "and draft a personalized response using the appropriate template."
        )
    )
Cosa dovrebbe ispezionare il discente nel codice?

Cerca il punto esatto in cui lo scope del sistema è delimitato: definizioni di schema, impostazione del prompt, configurazione di runtime e il punto di chiamata che trasforma l'intenzione dell'utente in un'azione concreta del modello o del workflow.

create_sdk_mcp_server(
allowed_tools=[
output_format={
async with ClaudeSDKClient
Come si relaziona la sandbox al sorgente?

La sandbox dovrebbe rendere leggibile l'UX: cosa vede l'utente, cosa sta decidendo il sistema e come il risultato diventa revisionabile. Il sorgente mostra poi come quel comportamento è effettivamente implementato.

Launch an investigation task.
Inspect the runtime trace across knowledge files and billing tools.
Review the final structured output and drafted customer email.
SandboxRuntime-backed investigation with explicit review artifacts
Full runtime investigation surface

This simulation shows what changes when an agent can read files, search internal knowledge, and call external tools inside a full runtime harness.

Spiegazione UX

The experience should show that the system is not merely answering from a prompt. It is investigating across multiple resources, so the user needs durable visibility into what was searched, what was verified, and what action was finally taken.

Spiegazione AI Design

A harness grants much broader capability than tool-calling alone. That makes inspectability, permission boundaries, and structured output even more important, because the system can now move across a real working environment.

Guida all'interazione

  1. 1Launch an investigation task.
  2. 2Inspect the runtime trace across knowledge files and billing tools.
  3. 3Review the final structured output and drafted customer email.

Runtime task

Customer cust_12345 reports a duplicate charge on their February bill. Investigate and draft the right response.

Knowledge filesMCP billing toolsStructured output

Runtime trace

The harness trace should expose both knowledge lookup and external tool use.

Structured handoff

A wide-capability agent still needs a final reviewable package for the product to present.

Why this needs stronger UX

  • Knowledge access and external tools in one runtime
  • Explicit permission and budget settings
  • Structured output keeps a wide-capability agent reviewable
Usato in corsi e percorsi

Questo esempio attualmente è indipendente nella libreria, ma si connette comunque al sistema dei principi e alla famiglia di esempi più ampia.

Principi correlati

Runtime architecture

Usa questo esempio nei tuoi agenti

Questo esempio è disponibile anche tramite il layer agent-ready del blueprint. Usa la pagina Per agenti per recuperare MCP pubblico, export deterministici e setup per Claude o Cursor.

Definisci trigger, contesto e confini prima di aumentare l'autonomia
Rendi espliciti controllo, osservabilita e recovery nel runtime
Scegli i pattern operativi giusti prima di delegare ai workflow