Vai al contenuto principaleVai al footer
EsempioscriptadvancedEseguibileagent-harness

Livello 4: Agent Harness — Accesso completo al runtime

Fornisci all’agente un runtime completo tramite l’agent runtime SDK. Può cercare file, leggere documenti e ragionare sui problemi in modo autonomo.

Fatti chiave

Livello
advanced
Runtime
Python • Pydantic + Python Dotenv
Pattern
Runtime-backed investigation with explicit review artifacts
Interazione
Sandbox live • Script
Aggiornato
14 marzo 2026

Naviga questo esempio

Vista rapida del flusso

Come questo esempio si muove tra input, esecuzione e risultato rivedibile

Livello 4: Agent Harness —… -> Run the agent task -> Investigation scope -> Runtime activity -> Structured handoff -> Knowledge access and external…

Avvio

Livello 4: Agent Harness —…

Checkpoint

Run the agent task

Esito

Investigation scope

Perché esiste questa pagina

Questo esempio è mostrato sia come codice sorgente reale che come pattern di interazione orientato al prodotto, così i discenti possono collegare implementazione, UX e doctrine senza lasciare la libreria.

Flusso visivoCodice realeSandbox o walkthroughAccesso MCP

Come dovrebbe essere usato questo esempio nella piattaforma?

Usa prima la sandbox per comprendere il pattern di esperienza, poi ispeziona il sorgente per vedere come il confine del prodotto, il confine del modello e il confine della doctrine sono effettivamente implementati.

UX pattern: Runtime-backed investigation with explicit review artifacts
Knowledge access and external tools in one runtime
Explicit permission and budget settings
Structured output keeps a wide-capability agent reviewable

Riferimenti sorgente

Voce di libreria
agents-agent-complexity-4-agent-harness
Percorso sorgente
content/example-library/sources/agents/agent-complexity/4-agent-harness.py
Librerie
pydantic, python-dotenv
Requisiti di runtime
Ambiente del repository locale
Principi correlati
Progettare per la delega piuttosto che per la manipolazione diretta, Sostituire la magia implicita con modelli mentali chiari, Rappresentare il lavoro delegato come un sistema, non solo come una conversazione, Ottimizzare per la guida, non solo per l'inizio

Contesto del modello

Model-agnosticoEseguibile in localeTool calling wrapped accettabileRagionamento medioL'orchestrazione compensa

The harness pattern adds structural guardrails that compensate for lower model quality. Local models are viable when the harness validates outputs before acting.

4-agent-harness.py

python
"""
Level 4: Agent Harness — Full Runtime Access (governed pattern)
Give the agent a full runtime via the Claude Agent SDK for INVESTIGATION,
keep destructive execution behind a separate approval gate.

NOTE: Run with `python 4-agent-harness.py` (not IPython/Jupyter).
The Claude Agent SDK uses anyio TaskGroups incompatible with nest_asyncio.

https://platform.claude.com/docs/en/agent-sdk/python

Design notes (iter-2, 2026-05-22):
The autonomous agent has READ-ONLY billing access — it can verify a
transaction but cannot issue a refund. The agent emits a typed
`RefundRecommendation`; a separate `execute_refund_with_approval()`
function runs the destructive action behind a caller-supplied
`approval_callback`. This makes the automation boundary machine-enforced
(not prompt-enforced) per Blueprint principles P5 (clear mental models)
and P8 (explicit hand-offs / approvals / blockers).

Audit trail (P7) captures every phase transition + the approval reason.
Operational phases (P6) are user-meaningful, not raw SDK message types.
"""

import asyncio
import json
import uuid
from datetime import datetime, timezone
from enum import Enum
from pathlib import Path
from typing import Awaitable, Callable

from pydantic import BaseModel
from claude_agent_sdk import (
    AssistantMessage,
    ClaudeAgentOptions,
    ClaudeSDKClient,
    ResultMessage,
    TextBlock,
    ToolUseBlock,
    tool,
    create_sdk_mcp_server,
)
from dotenv import load_dotenv

load_dotenv()

KNOWLEDGE_DIR = Path(__file__).parent / "knowledge"


# ── Tools exposed to the autonomous agent ────────────────────────────
# Only READ-ONLY billing access. `issue_refund` is INTENTIONALLY
# NOT registered as a tool — destructive execution is gated by
# `execute_refund_with_approval()` below.


@tool(
    "check_payment_gateway",
    "Check payment processor for transaction status and refund eligibility",
    {"transaction_date": str, "amount": str},
)
async def check_payment_gateway(args):
    return {
        "content": [
            {
                "type": "text",
                "text": (
                    f"Payment Gateway Response for {args['transaction_date']} — ${args['amount']}:\n"
                    "- Transaction ID: txn_8f3k2j1\n"
                    "- Status: SETTLED\n"
                    "- Refund eligible: YES\n"
                    "- Original payment method: Visa ending in 4242\n"
                    "- Settlement date: 2025-02-02"
                ),
            }
        ]
    }


# ── Typed schemas ────────────────────────────────────────────────────


class CustomerEmail(BaseModel):
    subject: str
    body: str


class RefundDecision(str, Enum):
    """What the AUTONOMOUS agent recommends. Execution is a separate step."""
    APPROVE = "approve_refund"
    DENY = "deny_refund"
    NEEDS_MORE_INFO = "needs_more_info"


class ApprovalDecision(str, Enum):
    """What the human/policy approval gate returns. Captured in audit."""
    APPROVED = "approved"
    REJECTED = "rejected"
    NEEDS_REVIEW = "needs_review"


class HarnessPhase(str, Enum):
    """User-meaningful operational states (P6)."""
    INVESTIGATING = "investigating"
    VERIFYING_TRANSACTION = "verifying_transaction"
    DRAFTING_RECOMMENDATION = "drafting_recommendation"
    AWAITING_APPROVAL = "awaiting_approval"
    APPROVAL_DECIDED = "approval_decided"
    REFUND_EXECUTED = "refund_executed"
    REFUND_BLOCKED = "refund_blocked"
    COMPLETED = "completed"
    FAILED = "failed"


class AuditEvent(BaseModel):
    """Append-only audit record (P7)."""
    timestamp: datetime
    phase: HarnessPhase
    detail: str


class RefundRecommendation(BaseModel):
    """What the autonomous agent outputs. Bound to evidence."""
    decision: RefundDecision
    amount_usd: float | None
    customer_id: str
    rationale: str
    policy_rule_applied: str  # the rule NAME, not the model's claim
    evidence_files: list[str]  # knowledge/ paths the model consulted
    payment_gateway_result: str  # verbatim output from check_payment_gateway
    customer_email: CustomerEmail


class ApprovalResult(BaseModel):
    """What the approval callback returns. `reason` is captured in audit."""
    decision: ApprovalDecision
    reason: str


class HarnessOutput(BaseModel):
    """Final harness output. `recommendation` is always set on success;
    `refund_execution_id` is ONLY set if the approval gate returned
    APPROVED and execution succeeded."""
    run_id: str
    recommendation: RefundRecommendation | None
    audit_trail: list[AuditEvent]
    final_state: HarnessPhase
    refund_execution_id: str | None = None


# ── Investigation phase: autonomous agent, READ-ONLY billing ────────


async def run_harness(task: str) -> HarnessOutput:
    """Run the autonomous investigation agent. Emits a recommendation;
    NEVER calls a refund. Caller is responsible for passing the result
    through `execute_refund_with_approval()` if execution is intended.
    """
    server = create_sdk_mcp_server(
        name="billing-api",
        version="1.0.0",
        tools=[check_payment_gateway],  # ← refund execution removed (P8)
    )

    options = ClaudeAgentOptions(
        system_prompt=(
            "You are a senior support analyst with access to:\n\n"
            f"1. A knowledge base at: {KNOWLEDGE_DIR}\n"
            "   - policies/ — refund policy, escalation matrix, subscription management\n"
            "   - customers/ — customer profiles with transaction history\n"
            "   - templates/ — response templates\n\n"
            "2. External billing API (READ-ONLY for you):\n"
            "   - check_payment_gateway — verify transaction status\n\n"
            "You CANNOT execute refunds. A separate approval step (outside your "
            "control) decides whether to execute. Your job is to investigate, "
            "then emit a RefundRecommendation. Cite every knowledge/ file you "
            "consulted by path and quote the policy rule by name. If the "
            "evidence does not support a clean approve/deny, return "
            "NEEDS_MORE_INFO with the missing facts spelled out."
        ),
        allowed_tools=[
            "Read",
            "Glob",
            "Grep",
            "mcp__billing-api__check_payment_gateway",
            # `mcp__billing-api__issue_refund` INTENTIONALLY OMITTED (P8)
        ],
        mcp_servers={"billing-api": server},
        output_format={
            "type": "json_schema",
            "schema": RefundRecommendation.model_json_schema(),
        },
        permission_mode="acceptEdits",
        max_turns=15,
        max_budget_usd=1.00,
        model="sonnet",
        cwd=str(KNOWLEDGE_DIR),
    )

    run_id = str(uuid.uuid4())
    audit: list[AuditEvent] = []

    def _log(phase: HarnessPhase, detail: str) -> None:
        audit.append(
            AuditEvent(
                timestamp=datetime.now(timezone.utc),
                phase=phase,
                detail=detail,
            )
        )

    _log(HarnessPhase.INVESTIGATING, f"task: {task[:120]}")

    recommendation: RefundRecommendation | None = None

    async with ClaudeSDKClient(options=options) as client:
        await client.query(task)
        async for message in client.receive_response():
            if isinstance(message, AssistantMessage):
                for block in message.content:
                    if isinstance(block, ToolUseBlock):
                        phase = (
                            HarnessPhase.VERIFYING_TRANSACTION
                            if "check_payment_gateway" in block.name
                            else HarnessPhase.INVESTIGATING
                        )
                        _log(phase, f"tool: {block.name}({block.input})")
                    elif isinstance(block, TextBlock):
                        # Foreground stream — concise, no console flood.
                        print(block.text)
            elif isinstance(message, ResultMessage):
                cost = (
                    f"${message.total_cost_usd:.4f}"
                    if message.total_cost_usd
                    else "n/a"
                )
                _log(
                    HarnessPhase.DRAFTING_RECOMMENDATION,
                    f"agent done · turns={message.num_turns} · cost={cost}",
                )
                if message.structured_output:
                    raw = (
                        json.loads(message.structured_output)
                        if isinstance(message.structured_output, str)
                        else message.structured_output
                    )
                    try:
                        recommendation = RefundRecommendation.model_validate(raw)
                    except Exception as exc:
                        _log(
                            HarnessPhase.FAILED,
                            f"recommendation parse failed: {exc}",
                        )

    if recommendation is None:
        _log(HarnessPhase.REFUND_BLOCKED, "agent emitted no structured recommendation")
        return HarnessOutput(
            run_id=run_id,
            recommendation=None,
            audit_trail=audit,
            final_state=HarnessPhase.REFUND_BLOCKED,
        )

    _log(
        HarnessPhase.AWAITING_APPROVAL,
        f"decision={recommendation.decision.value} · amount={recommendation.amount_usd}",
    )
    return HarnessOutput(
        run_id=run_id,
        recommendation=recommendation,
        audit_trail=audit,
        final_state=HarnessPhase.AWAITING_APPROVAL,
    )


# ── Execution phase: SEPARATE, gated by approval callback ───────────


async def execute_refund_with_approval(
    output: HarnessOutput,
    approval_callback: Callable[[RefundRecommendation], Awaitable[ApprovalResult]],
) -> HarnessOutput:
    """The ONLY path to actual refund execution. Gated by caller-supplied
    `approval_callback` — could be a human-in-loop CLI prompt, a policy
    service decision, or a teams-channel approval bot. The boundary is
    machine-enforced: even if the agent recommended APPROVE, the refund
    is not executed unless the approval callback returns APPROVED.
    """
    audit = output.audit_trail

    def _log(phase: HarnessPhase, detail: str) -> None:
        audit.append(
            AuditEvent(
                timestamp=datetime.now(timezone.utc),
                phase=phase,
                detail=detail,
            )
        )

    if output.recommendation is None:
        _log(HarnessPhase.REFUND_BLOCKED, "no recommendation to act on")
        output.final_state = HarnessPhase.REFUND_BLOCKED
        return output

    if output.recommendation.decision is not RefundDecision.APPROVE:
        _log(
            HarnessPhase.REFUND_BLOCKED,
            f"agent did not recommend approval: {output.recommendation.decision.value}",
        )
        output.final_state = HarnessPhase.REFUND_BLOCKED
        return output

    approval = await approval_callback(output.recommendation)
    _log(
        HarnessPhase.APPROVAL_DECIDED,
        f"approval={approval.decision.value} · reason={approval.reason}",
    )

    if approval.decision is not ApprovalDecision.APPROVED:
        output.final_state = HarnessPhase.REFUND_BLOCKED
        return output

    # MOCK: in production, replace with real payment gateway call.
    # This boundary is where execution becomes irreversible — keep it
    # isolated, audited, and behind the approval gate above. Wrap real
    # gateway calls in retries/idempotency keys; persist
    # `refund_execution_id` durably before returning to the caller.
    output.refund_execution_id = "ref_9x2m4p7"
    _log(
        HarnessPhase.REFUND_EXECUTED,
        f"refund_execution_id={output.refund_execution_id}",
    )
    output.final_state = HarnessPhase.REFUND_EXECUTED
    return output


# ── Example wiring: CLI approval gate ───────────────────────────────


async def _cli_approval_callback(
    recommendation: RefundRecommendation,
) -> ApprovalResult:
    """Example approval gate — prints the recommendation and asks for
    explicit confirmation. In production, replace with the appropriate
    surface (policy service, teams-channel bot, human-in-loop UI)."""
    print("\n══ AWAITING APPROVAL ══")
    print(f"Decision recommended: {recommendation.decision.value}")
    print(f"Amount: ${recommendation.amount_usd}")
    print(f"Policy rule: {recommendation.policy_rule_applied}")
    print(f"Rationale: {recommendation.rationale}")
    print(f"Evidence files: {recommendation.evidence_files}")
    raw = input("Approve? (approved / rejected / needs_review): ").strip().lower()
    reason = input("Reason (captured in audit): ").strip() or "no reason given"
    try:
        decision = ApprovalDecision(raw)
    except ValueError:
        decision = ApprovalDecision.NEEDS_REVIEW
        reason = f"invalid input '{raw}' coerced to needs_review · {reason}"
    return ApprovalResult(decision=decision, reason=reason)


async def main() -> None:
    output = await run_harness(
        "Customer cust_12345 reports a duplicate charge on their February bill. "
        "Investigate using the knowledge base, determine the right action per policy, "
        "and draft a personalized response using the appropriate template."
    )
    output = await execute_refund_with_approval(output, _cli_approval_callback)

    print("\n══ FINAL OUTPUT ══")
    print(output.model_dump_json(indent=2))


if __name__ == "__main__":
    asyncio.run(main())

Cosa dovrebbe ispezionare il discente nel codice?

Cerca il punto esatto in cui lo scope del sistema è delimitato: definizioni di schema, impostazione del prompt, configurazione di runtime e il punto di chiamata che trasforma l'intenzione dell'utente in un'azione concreta del modello o del workflow.

create_sdk_mcp_server(
allowed_tools=[
output_format={
async with ClaudeSDKClient

Come si relaziona la sandbox al sorgente?

La sandbox dovrebbe rendere leggibile l'UX: cosa vede l'utente, cosa sta decidendo il sistema e come il risultato diventa revisionabile. Il sorgente mostra poi come quel comportamento è effettivamente implementato.

Launch an investigation task.
Inspect the runtime trace across knowledge files and billing tools.
Review the final structured output and drafted customer email.
SandboxRuntime-backed investigation with explicit review artifacts

Full runtime investigation surface

This simulation shows what changes when an agent can read files, search internal knowledge, and call external tools inside a full runtime harness.

Spiegazione UX

The experience should show that the system is not merely answering from a prompt. It is investigating across multiple resources, so the user needs durable visibility into what was searched, what was verified, and what action was finally taken.

Spiegazione AI Design

A harness grants much broader capability than tool-calling alone. That makes inspectability, permission boundaries, and structured output even more important, because the system can now move across a real working environment.

Guida all'interazione

  1. 1Launch an investigation task.
  2. 2Inspect the runtime trace across knowledge files and billing tools.
  3. 3Review the final structured output and drafted customer email.

Runtime task

Customer cust_12345 reports a duplicate charge on their February bill. Investigate and draft the right response.

Knowledge filesMCP billing toolsStructured output

Runtime trace

The harness trace should expose both knowledge lookup and external tool use.

Structured handoff

A wide-capability agent still needs a final reviewable package for the product to present.

Why this needs stronger UX

  • Knowledge access and external tools in one runtime
  • Explicit permission and budget settings
  • Structured output keeps a wide-capability agent reviewable

Usato in corsi e percorsi

Questo esempio attualmente è indipendente nella libreria, ma si connette comunque al sistema dei principi e alla famiglia di esempi più ampia.

Principi correlati

Runtime architecture

Usa questo esempio nei tuoi agenti

Questo esempio è disponibile anche tramite il layer agent-ready del blueprint. Usa la pagina Per agenti per recuperare MCP pubblico, export deterministici e setup per Claude o Cursor.

Definisci trigger, contesto e confini prima di aumentare l'autonomia
Rendi espliciti controllo, osservabilita e recovery nel runtime
Scegli i pattern operativi giusti prima di delegare ai workflow