Key Concepts¶
Learn the fundamental concepts of DSPydantic.
What is DSPydantic?¶
DSPydantic = DSPy (DSPy language model optimization) + Pydantic (data validation).
It automatically optimizes your prompts using examples, improving extraction accuracy without manual prompt engineering.
Core Components¶
Prompter¶
The main class for extraction and optimization.
from dspydantic import Prompter
prompter = Prompter(
model=MyPydanticModel,
model_id="openai/gpt-4o-mini"
)
# Optimize with examples
result = prompter.optimize(examples=examples)
# Extract from new data
extracted = prompter.run("text")
Key methods:
- optimize() — Learn from examples
- run() — Extract from new data
- predict_batch() — Batch extraction
- save() / load() — Persist for production
Example¶
Input-output pairs for optimization.
from dspydantic import Example
Example(
text="input text",
expected_output={...} # dict or Pydantic model
)
Examples teach the optimizer what good extraction looks like.
OptimizationResult¶
Returned by prompter.optimize().
result = prompter.optimize(examples=examples)
result.baseline_score # Score before optimization
result.optimized_score # Score after optimization
result.api_calls # API calls made
result.total_tokens # Tokens used
result.optimized_descriptions # Optimized field descriptions
The Optimization Loop¶
DSPydantic optimizes three things:
- Field Descriptions — Makes them clearer for the LLM
- System Prompt — Sets overall context
- Instruction Prompt — Guides extraction step-by-step
How it works:
Define Model
↓
Create Examples
↓
Optimize (test variations, measure accuracy)
↓
Optimized Prompter (ready to use)
↓
Extract from New Data
Evaluators¶
Measure extraction quality during optimization.
| Evaluator | Best For |
|---|---|
exact |
IDs, exact matches |
levenshtein |
Names (allow variations) |
text_similarity |
Descriptions |
score_judge |
Complex evaluation |
| Custom | Domain-specific logic |
The optimizer tests field descriptions and prompts against examples, using evaluators to measure quality.
Input Types¶
DSPydantic supports multiple input modalities:
| Type | Example |
|---|---|
| Text | text="sample text" |
| Image | image_path="image.png" or image_base64=... |
pdf_path="document.pdf" |
|
| Dict | text={"key": "value"} for templates |
Output Types¶
With Pydantic model:
Without model (text output):
Production Features¶
| Feature | Purpose |
|---|---|
| Save/Load | Persist optimized prompters |
| Batch | Extract from multiple inputs |
| Async | Non-blocking extraction |
| Confidence | Measure extraction certainty |
| Caching | Reduce API calls |
Quick Example¶
from pydantic import BaseModel, Field
from dspydantic import Example, Prompter
import dspy
# 1. Define model
class Person(BaseModel):
name: str = Field(description="Full name")
age: int = Field(description="Age in years")
# 2. Create examples
examples = [
Example(text="John is 30 years old", expected_output={"name": "John", "age": 30}),
Example(text="Sarah is 25", expected_output={"name": "Sarah", "age": 25}),
]
# 3. Configure and optimize
dspy.configure(lm=dspy.LM("openai/gpt-4o-mini"))
prompter = Prompter(model=Person)
result = prompter.optimize(examples=examples)
print(f"Accuracy: {result.optimized_score:.0%}")
# 4. Extract from new data
person = prompter.run("Alice is 28 years old")
print(person) # Person(name='Alice', age=28)
Next Steps¶
- How Optimization Works — Deep dive into optimization
- Understanding Evaluators — Evaluation mechanics
- Choosing an Evaluator — Evaluator selection guide