ExamplescriptintermediateRunnableresearch-brief
Chunking
Runnable example (intermediate) for script using docling, ipykernel.
Key Facts
- Level
- intermediate
- Runtime
- Python • OpenAI API
- Pattern
- Context-backed research with explicit evidence
- Interaction
- Live sandbox • Script
- Updated
- 14 March 2026
Navigate this example
Library
Browse examplesReopen the wider library to compare adjacent patterns and linked learning paths.Interaction
Run sandbox nowTry the interaction directly in this example’s guided sandbox surface.Source
Open full sourceRead the real implementation, highlighted checkpoints, and runtime requirements.MCP
Call via MCPUse the same resource inside agents, deterministic exports, and MCP setup flows.
Linked principles
2-chunking.py
python
from docling.chunking import HybridChunker
from docling.document_converter import DocumentConverter
from dotenv import load_dotenv
from openai import OpenAI
from utils.tokenizer import OpenAITokenizerWrapper
load_dotenv()
# Initialize OpenAI client (make sure you have OPENAI_API_KEY in your environment variables)
client = OpenAI()
tokenizer = OpenAITokenizerWrapper() # Load our custom tokenizer for OpenAI
MAX_TOKENS = 8191 # text-embedding-3-large's maximum context length
# --------------------------------------------------------------
# Extract the data
# --------------------------------------------------------------
converter = DocumentConverter()
result = converter.convert("https://arxiv.org/pdf/2408.09869")
# --------------------------------------------------------------
# Apply hybrid chunking
# --------------------------------------------------------------
chunker = HybridChunker(
tokenizer=tokenizer,
max_tokens=MAX_TOKENS,
merge_peers=True,
)
chunk_iter = chunker.chunk(dl_doc=result.document)
chunks = list(chunk_iter)
len(chunks)
Related principles
- P4trustApply progressive disclosure to system agencyProvide the minimum information necessary by default, while enabling users to inspect additional detail when confidence, understanding, or intervention is required.Open principle →
- P6visibilityExpose meaningful operational state, not internal complexityPresent the state of the system in language and structures that are relevant to the user, rather than exposing low-level internals that do not support action or understanding.Open principle →
- P7trustEstablish trust through inspectabilityUsers should be able to examine how a result was produced when confidence, accountability, or decision quality is important.Open principle →