Optimize With a Pydantic Schema¶

Optimize for structured output by passing a Pydantic model to Prompter. DSPydantic optimizes field descriptions and prompts to maximize extraction accuracy. Use any input modality — format examples as in Optimization Modalities.

When to use¶

You want structured dict/object output
Examples have expected_output as a dict matching your model
You need typed, validated extraction

Workflow¶

1. Define your model¶

Create a Pydantic model with field descriptions:

from pydantic import BaseModel, Field
from typing import Literal

class Review(BaseModel):
    """Extract review data."""
    sentiment: Literal["positive", "negative", "neutral"] = Field(
        description="Overall sentiment of the review"
    )
    rating: int = Field(description="Rating from 1 to 5")
    summary: str = Field(description="Brief summary of the review")

2. Create examples¶

Use dict expected_output matching your model's fields:

from dspydantic import Example

examples = [
    Example(
        text="Amazing product! Works perfectly and exceeded expectations.",
        expected_output={
            "sentiment": "positive",
            "rating": 5,
            "summary": "Product exceeded expectations"
        }
    ),
    Example(
        text="Broke after two weeks. Complete waste of money.",
        expected_output={
            "sentiment": "negative",
            "rating": 1,
            "summary": "Product broke quickly"
        }
    ),
    Example(
        text="It's okay. Does what it says but nothing special.",
        expected_output={
            "sentiment": "neutral",
            "rating": 3,
            "summary": "Average product, meets basic expectations"
        }
    ),
]

3. Optimize¶

import dspy
from dspydantic import Prompter

dspy.configure(lm=dspy.LM("openai/gpt-4o", api_key="your-api-key"))

prompter = Prompter(model=Review)
result = prompter.optimize(examples=examples)

4. Run¶

data = prompter.run("This is the best purchase I've ever made!")
print(data)
# Review(sentiment='positive', rating=5, summary='Best purchase ever')

Images and PDFs¶

Same pattern with image or PDF inputs: use dict expected_output matching your model.

from pydantic import BaseModel, Field

class Digit(BaseModel):
    digit: int = Field(description="The handwritten digit (0-9)")

class Invoice(BaseModel):
    invoice_number: str = Field(description="Invoice ID")
    total: str = Field(description="Total amount")

# Images
examples = [
    Example(image_path="digit_5.png", expected_output={"digit": 5}),
    Example(image_path="digit_3.png", expected_output={"digit": 3}),
]

prompter = Prompter(model=Digit)
result = prompter.optimize(examples=examples)
digit = prompter.run(image_path="new_digit.png")  # Digit(digit=7)

# PDFs
examples = [
    Example(pdf_path="invoice.pdf", expected_output={"invoice_number": "INV-001", "total": "$500"}),
]

prompter = Prompter(model=Invoice)
result = prompter.optimize(examples=examples)
inv = prompter.run(pdf_path="new_invoice.pdf")  # Invoice(...)

How it works¶

Step	What happens
`model=YourModel`	Schema fields are used for structured extraction
Optimize	Field descriptions and prompts are optimized for accuracy
Run	`prompter.run(...)` returns an instance of your model

What gets optimized¶

What	Impact
Field descriptions	High
System / instruction prompts	Medium

Tips¶

Every example must have expected_output as a dict matching your model
Field descriptions guide the LLM — be specific
Use Literal for categorical fields with known values
Use | None for optional fields
For string output without a schema, use Without Pydantic Schema
Reference: Example