Skip to content

Types

Core types and data structures.

Example

Example(expected_output=None, text=None, image_path=None, image_base64=None, pdf_path=None, pdf_dpi=300)

Example data for optimization.

This class automatically prepares input data from various input types: - Plain text - Images (from file path or base64 string) - PDFs (converted to images at specified DPI)

Examples:

# Plain text
Example(
    text="John Doe, 30 years old",
    expected_output={"name": "John Doe", "age": 30}
)

# Text dict for template formatting
Example(
    text={"name": "John Doe", "location": "New York"},
    expected_output={"name": "John Doe", "age": 30}
)

# Image from file
Example(
    image_path="document.png",
    expected_output={"name": "John Doe", "age": 30}
)

# PDF (converted to 300 DPI images)
Example(
    pdf_path="document.pdf",
    pdf_dpi=300,
    expected_output={"name": "John Doe", "age": 30}
)

# Combined text and image
Example(
    text="Extract information from this document",
    image_path="document.png",
    expected_output={"name": "John Doe", "age": 30}
)

# Image from base64 string
Example(
    image_base64="iVBORw0KG...",
    expected_output={"name": "John Doe", "age": 30}
)

# Without expected_output (uses LLM judge for evaluation)
Example(
    text="John Doe, 30 years old",
    expected_output=None
)

Attributes:

Name Type Description
input_data

Input data dictionary (automatically generated from input parameters).

text_dict

Dictionary of text values for template formatting. Used to format instruction prompt templates with placeholders like "{key}". Set automatically when text parameter is a dict.

expected_output

Expected output. Can be a str, dict, or Pydantic model matching the target schema. If a string, it will be wrapped in a single-field model with field name "output". If a Pydantic model, it will be converted to a dict for comparison. If None, evaluation will use an LLM judge or custom evaluation function instead of comparing against expected output.

Initialize an Example.

Parameters:

Name Type Description Default
expected_output str | dict[str, Any] | BaseModel | None

Expected output. Can be a str, dict, or Pydantic model. If a string, it will be wrapped in a single-field model with field name "output". If None, evaluation will use an LLM judge or custom evaluation function.

None
text str | dict[str, str] | None

Plain text input (str) or dictionary of text values for template formatting (dict). If a dict, keys correspond to placeholders in instruction prompt templates (e.g., {"key": "value"}). For input_data text extraction, known keys "text", "review", "content", "input" are checked first. If none match, values are joined with spaces as fallback.

None
image_path str | Path | None

Path to an image file to convert to base64.

None
image_base64 str | None

Base64-encoded image string.

None
pdf_path str | Path | None

Path to a PDF file to convert to images.

None
pdf_dpi int

DPI for PDF conversion (default: 300).

300

Raises:

Type Description
ValueError

If no input parameters are provided.

Source code in src/dspydantic/types.py
def __init__(
    self,
    expected_output: str | dict[str, Any] | BaseModel | None = (None),
    text: str | dict[str, str] | None = None,
    image_path: str | Path | None = None,
    image_base64: str | None = None,
    pdf_path: str | Path | None = None,
    pdf_dpi: int = 300,
) -> None:
    """Initialize an Example.

    Args:
        expected_output: Expected output. Can be a str, dict, or Pydantic model.
            If a string, it will be wrapped in a single-field model with field name "output".
            If None, evaluation will use an LLM judge or custom evaluation function.
        text: Plain text input (str) or dictionary of text values for template
            formatting (dict). If a dict, keys correspond to placeholders in
            instruction prompt templates (e.g., {"key": "value"}). For input_data
            text extraction, known keys "text", "review", "content", "input" are
            checked first. If none match, values are joined with spaces as fallback.
        image_path: Path to an image file to convert to base64.
        image_base64: Base64-encoded image string.
        pdf_path: Path to a PDF file to convert to images.
        pdf_dpi: DPI for PDF conversion (default: 300).

    Raises:
        ValueError: If no input parameters are provided.
    """
    self.expected_output = expected_output

    if isinstance(text, dict):
        self.text_dict = text
        text_string = (
            text.get("text")
            or text.get("review")
            or text.get("content")
            or text.get("input")
            or None
        )
        if text_string is None and text:
            text_string = " ".join(str(v) for v in text.values())
    else:
        self.text_dict = {}
        text_string = text

    # Use prepare_input_data to create input_data from parameters
    # If text_string is None and no other inputs, we'll set input_data manually later
    try:
        self.input_data = prepare_input_data(
            text=text_string,
            image_path=image_path,
            image_base64=image_base64,
            pdf_path=pdf_path,
            pdf_dpi=pdf_dpi,
        )
    except ValueError:
        # If no inputs provided and text is a dict, create empty input_data
        # It can be set manually later if needed
        if isinstance(text, dict):
            self.input_data = {}
        else:
            raise

Functions

OptimizationResult dataclass

OptimizationResult(optimized_descriptions, optimized_system_prompt, optimized_instruction_prompt, metrics, baseline_score, optimized_score, optimized_demos=None, api_calls=0, total_tokens=0, estimated_cost_usd=None)

Result of Pydantic model optimization.

Attributes:

Name Type Description
optimized_descriptions dict[str, str]

Dictionary mapping field paths to optimized descriptions.

optimized_system_prompt str | None

Optimized system prompt (if provided).

optimized_instruction_prompt str | None

Optimized instruction prompt (if provided).

optimized_demos list[dict[str, Any]] | None

Few-shot examples (input_data, expected_output) for the extraction prompt.

metrics dict[str, Any]

Dictionary containing optimization metrics (score, improvement, etc.).

baseline_score float

Baseline score before optimization.

optimized_score float

Score after optimization.

api_calls int

Total number of API calls made during optimization.

total_tokens int

Total tokens used during optimization (if available).

estimated_cost_usd float | None

Estimated cost in USD (if available).

ExtractionResult dataclass

ExtractionResult(data, confidence=None, raw_output=None)

Result of extraction with optional metadata.

Attributes:

Name Type Description
data BaseModel

The extracted Pydantic model instance.

confidence float | None

Confidence score (0.0-1.0) if requested.

raw_output str | None

Raw LLM output text.

PrompterState dataclass

PrompterState(model_schema, optimized_descriptions, optimized_system_prompt, optimized_instruction_prompt, model_id, model_config, version, metadata, optimized_demos=None)

State of a Prompter instance for serialization.

This class contains all the information needed to save and restore a Prompter instance.

Attributes:

Name Type Description
model_schema dict[str, Any]

JSON schema of the Pydantic model.

optimized_descriptions dict[str, str]

Dictionary of optimized field descriptions.

optimized_system_prompt str | None

Optimized system prompt (if any).

optimized_instruction_prompt str | None

Optimized instruction prompt (if any).

model_id str

LLM model identifier.

model_config dict[str, Any]

Model configuration (API base, version, etc.).

version str

dspydantic version for compatibility checking.

metadata dict[str, Any]

Additional metadata (timestamp, optimization metrics, etc.).

FieldOptimizationProgress dataclass

FieldOptimizationProgress(phase, score_before, score_after, improved, total_fields, field_path=None, field_index=None, elapsed_seconds=0.0, optimized_value=None)

Progress update emitted during field-by-field optimization.

Attributes:

Name Type Description
phase str

Current optimization phase. Valid values: - "baseline": Initial evaluation before optimization - "fields": Field description optimization - "skipped": Field was skipped (already above threshold) - "system_prompt": System prompt optimization - "instruction_prompt": Instruction prompt optimization - "complete": Optimization finished

score_before float

Score before this optimization step.

score_after float

Score after this optimization step.

improved bool

True if score improved.

total_fields int

Total number of fields being optimized.

field_path str | None

Dot-notation path of the field just optimized (None for non-field phases).

field_index int | None

1-based index of the field (None for non-field phases).

elapsed_seconds float

Wall-clock seconds elapsed since optimization started.

optimized_value str | None

The optimized description or prompt text (None for non-field/non-prompt phases).

Example

The Example class represents a single example for optimization. It supports multiple input types:

  • Text: Plain text string or dictionary for prompt templates
  • Images: File path (image_path) or base64-encoded string (image_base64)
  • PDFs: File path (pdf_path) - automatically converted to images at specified DPI (default: 300)

PDFs are converted to images page by page for processing. Use pdf_dpi parameter to control conversion quality (default: 300 DPI).

OptimizationResult

The OptimizationResult dataclass contains the results of optimization:

  • optimized_descriptions: Dictionary mapping field paths to optimized descriptions
  • optimized_system_prompt: Optimized system prompt (if provided)
  • optimized_instruction_prompt: Optimized instruction prompt (if provided)
  • metrics: Dictionary containing optimization metrics
  • baseline_score: Baseline score before optimization
  • optimized_score: Score after optimization
  • api_calls: Total API calls made during optimization
  • total_tokens: Total tokens used during optimization

ExtractionResult

The ExtractionResult dataclass is returned by predict_with_confidence():

  • data: The extracted Pydantic model instance
  • confidence: Confidence score (0.0-1.0)
  • raw_output: Raw LLM output text (optional)

PrompterState

The PrompterState dataclass contains all information needed to save and restore a Prompter instance.

FieldOptimizationProgress

The FieldOptimizationProgress dataclass is emitted by the on_progress callback during optimization to track progress:

  • phase: Current optimization phase ("baseline", "fields", "skipped", "system_prompt", "instruction_prompt", "complete")
  • score_before: Score before this optimization step
  • score_after: Score after this optimization step
  • improved: Whether the score improved
  • total_fields: Total number of fields being optimized
  • field_path: Dot-notation path of the field being optimized (None for non-field phases)
  • field_index: 1-based index of the field (None for non-field phases)
  • elapsed_seconds: Wall-clock seconds elapsed since optimization started
  • optimized_value: The actual optimized description or prompt text (new in v0.1.3+)

Usage with Callbacks

def my_progress_callback(progress: FieldOptimizationProgress):
    if progress.phase == "fields":
        print(f"{progress.field_path}: {progress.score_before:.0%}{progress.score_after:.0%}")
        if progress.optimized_value:
            print(f"  Optimized to: {progress.optimized_value!r}")

optimizer.on_progress = my_progress_callback

Callbacks are automatically invoked when verbose=True with rich-formatted output showing optimized values.

See Also