Skip to content

Prompter

Unified class for optimizing and extracting with Pydantic models.

Prompter

Prompter(model=None, model_id=None, api_key=None, cache=False, system_prompt=None, instruction_prompt=None, optimized_descriptions=None, optimized_system_prompt=None, optimized_instruction_prompt=None, optimized_demos=None)

Unified class for optimizing and extracting with Pydantic models.

This class combines optimization and extraction functionality in a single interface. It wraps PydanticOptimizer and adds extraction capabilities along with save/load.

Examples:

Simple usage with model_id (recommended):

from dspydantic import Prompter
from pydantic import BaseModel, Field

class User(BaseModel):
    name: str = Field(description="User name")
    age: int = Field(description="User age")

# Create prompter with model_id - auto-configures DSPy
prompter = Prompter(model=User, model_id="openai/gpt-4o-mini")

# Extract directly (no optimization required)
data = prompter.run("John Doe, 30 years old")
print(data.name, data.age)  # John Doe 30

With optimization:

examples = [
    Example(text="John Doe, 30", expected_output={"name": "John Doe", "age": 30})
]
result = prompter.optimize(examples=examples)

# Extract with optimized prompts
data = prompter.run("Jane Smith, 25")

Manual DSPy configuration:

import dspy
lm = dspy.LM("openai/gpt-4o", api_key="your-key")
dspy.configure(lm=lm)

prompter = Prompter(model=User)  # Uses existing DSPy config

Initialize Prompter.

Parameters:

Name Type Description Default
model type[BaseModel] | None

Pydantic model class for extraction schema.

None
model_id str | None

LiteLLM model identifier (e.g., "openai/gpt-4o-mini", "anthropic/claude-3-sonnet"). If provided, automatically configures DSPy. Supports all models via LiteLLM.

None
api_key str | None

API key for the model provider. If None, uses environment variable (OPENAI_API_KEY, ANTHROPIC_API_KEY, etc.).

None
cache bool | str

Enable caching. True uses default ".dspydantic_cache", or provide path string.

False
system_prompt str | None

Initial system prompt for extraction.

None
instruction_prompt str | None

Initial instruction prompt for extraction.

None
optimized_descriptions dict[str, str] | None

Pre-optimized field descriptions (for loading).

None
optimized_system_prompt str | None

Pre-optimized system prompt (for loading).

None
optimized_instruction_prompt str | None

Pre-optimized instruction prompt (for loading).

None
optimized_demos list[dict[str, Any]] | None

Pre-optimized few-shot examples (for loading).

None
Example

prompter = Prompter(model=User, model_id="openai/gpt-4o-mini") # doctest: +SKIP data = prompter.run("John Doe, 30") # doctest: +SKIP

Source code in src/dspydantic/prompter.py
def __init__(
    self,
    model: type[BaseModel] | None = None,
    model_id: str | None = None,
    api_key: str | None = None,
    cache: bool | str = False,
    system_prompt: str | None = None,
    instruction_prompt: str | None = None,
    optimized_descriptions: dict[str, str] | None = None,
    optimized_system_prompt: str | None = None,
    optimized_instruction_prompt: str | None = None,
    optimized_demos: list[dict[str, Any]] | None = None,
) -> None:
    """Initialize Prompter.

    Args:
        model: Pydantic model class for extraction schema.
        model_id: LiteLLM model identifier (e.g., "openai/gpt-4o-mini", "anthropic/claude-3-sonnet").
            If provided, automatically configures DSPy. Supports all models via LiteLLM.
        api_key: API key for the model provider. If None, uses environment variable
            (OPENAI_API_KEY, ANTHROPIC_API_KEY, etc.).
        cache: Enable caching. True uses default ".dspydantic_cache", or provide path string.
        system_prompt: Initial system prompt for extraction.
        instruction_prompt: Initial instruction prompt for extraction.
        optimized_descriptions: Pre-optimized field descriptions (for loading).
        optimized_system_prompt: Pre-optimized system prompt (for loading).
        optimized_instruction_prompt: Pre-optimized instruction prompt (for loading).
        optimized_demos: Pre-optimized few-shot examples (for loading).

    Example:
        >>> prompter = Prompter(model=User, model_id="openai/gpt-4o-mini")  # doctest: +SKIP
        >>> data = prompter.run("John Doe, 30")  # doctest: +SKIP
    """
    self.model = model
    self.model_id = model_id
    self.system_prompt = system_prompt
    self.instruction_prompt = instruction_prompt

    # Optimized state (set after optimization or loading)
    self.optimized_descriptions = optimized_descriptions or {}
    self.optimized_system_prompt = optimized_system_prompt
    self.optimized_instruction_prompt = optimized_instruction_prompt
    self.optimized_demos = optimized_demos

    # Internal state
    self._optimizer: PydanticOptimizer | None = None

    # Handle caching
    cache_dir = None
    if cache:
        cache_dir = cache if isinstance(cache, str) else ".dspydantic_cache"

    # Auto-configure DSPy if model_id provided
    _configure_dspy_if_needed(model_id, api_key, cache_dir)

Functions

optimize

optimize(examples, evaluate_fn=None, optimizer=None, train_split=0.8, num_threads=4, verbose=False, exclude_fields=None, include_fields=None, evaluator_config=None, sequential=False, parallel_fields=True, max_val_examples=None, skip_score_threshold=None, on_progress=None, **kwargs)

Optimize prompts and field descriptions.

Uses PydanticOptimizer internally to perform optimization.

If model is None and examples have string expected_output values, a model with a single "output" field will be automatically created.

Parameters:

Name Type Description Default
examples list[Example]

List of examples for optimization.

required
evaluate_fn Callable[[Example, dict[str, str], str | None, str | None], float] | Callable[[Example, dict[str, Any], dict[str, str], str | None, str | None], float] | LM | str | None

Evaluation function or string metric.

None
optimizer str | Any | None

Optimizer name or instance (auto-selects if None).

None
train_split float

Training split fraction (default: 0.8).

0.8
num_threads int

Number of threads (default: 4).

4
verbose bool

Print progress (default: False).

False
exclude_fields list[str] | None

Field names to exclude from evaluation.

None
include_fields list[str] | None

Field names to include (only these are optimized/scored).

None
evaluator_config dict[str, Any] | None

Evaluator configuration dict.

None
sequential bool

If False (default), use single-pass optimization (all fields together). If True, optimize each field independently (deepest-first).

False
parallel_fields bool

If True (default), parallelize field optimization when sequential=True. Has no effect when sequential=False.

True
max_val_examples int | None

Optional cap on validation set size per field.

None
skip_score_threshold float | None

Optional threshold to skip high-scoring fields (sequential mode only).

None
on_progress Callable[[FieldOptimizationProgress], None] | None

Optional callback to receive FieldOptimizationProgress updates. Called automatically when verbose=True.

None
**kwargs Any

Additional kwargs passed to PydanticOptimizer.

{}

Returns:

Type Description
OptimizationResult

OptimizationResult with optimized descriptions and prompts.

Source code in src/dspydantic/prompter.py
def optimize(
    self,
    examples: list[Example],
    evaluate_fn: Callable[[Example, dict[str, str], str | None, str | None], float]
    | Callable[[Example, dict[str, Any], dict[str, str], str | None, str | None], float]
    | dspy.LM
    | str
    | None = None,
    optimizer: str | Any | None = None,
    train_split: float = 0.8,
    num_threads: int = 4,
    verbose: bool = False,
    exclude_fields: list[str] | None = None,
    include_fields: list[str] | None = None,
    evaluator_config: dict[str, Any] | None = None,
    sequential: bool = False,
    parallel_fields: bool = True,
    max_val_examples: int | None = None,
    skip_score_threshold: float | None = None,
    on_progress: Callable[[FieldOptimizationProgress], None] | None = None,
    **kwargs: Any,
) -> OptimizationResult:
    """Optimize prompts and field descriptions.

    Uses PydanticOptimizer internally to perform optimization.

    If model is None and examples have string expected_output values,
    a model with a single "output" field will be automatically created.

    Args:
        examples: List of examples for optimization.
        evaluate_fn: Evaluation function or string metric.
        optimizer: Optimizer name or instance (auto-selects if None).
        train_split: Training split fraction (default: 0.8).
        num_threads: Number of threads (default: 4).
        verbose: Print progress (default: False).
        exclude_fields: Field names to exclude from evaluation.
        include_fields: Field names to include (only these are optimized/scored).
        evaluator_config: Evaluator configuration dict.
        sequential: If False (default), use single-pass optimization (all fields together).
            If True, optimize each field independently (deepest-first).
        parallel_fields: If True (default), parallelize field optimization
            when sequential=True. Has no effect when sequential=False.
        max_val_examples: Optional cap on validation set size per field.
        skip_score_threshold: Optional threshold to skip high-scoring fields
            (sequential mode only).
        on_progress: Optional callback to receive FieldOptimizationProgress updates.
            Called automatically when verbose=True.
        **kwargs: Additional kwargs passed to PydanticOptimizer.

    Returns:
        OptimizationResult with optimized descriptions and prompts.
    """
    optimizer_instance = PydanticOptimizer(
        model=self.model,
        examples=examples,
        evaluate_fn=evaluate_fn,
        system_prompt=self.system_prompt,
        instruction_prompt=self.instruction_prompt,
        num_threads=num_threads,
        verbose=verbose,
        optimizer=optimizer,
        train_split=train_split,
        exclude_fields=exclude_fields,
        include_fields=include_fields,
        evaluator_config=evaluator_config,
        sequential=sequential,
        parallel_fields=parallel_fields,
        max_val_examples=max_val_examples,
        skip_score_threshold=skip_score_threshold,
        on_progress=on_progress,
        **kwargs,
    )

    # Run optimization
    result = optimizer_instance.optimize()

    # Update internal state
    # Store the model from optimizer (may be auto-created OutputModel)
    self.model = optimizer_instance.model
    self.optimized_descriptions = result.optimized_descriptions
    self.optimized_system_prompt = result.optimized_system_prompt
    self.optimized_instruction_prompt = result.optimized_instruction_prompt
    self.optimized_demos = result.optimized_demos

    return result

predict

predict(text=None, image_path=None, image_base64=None, pdf_path=None, pdf_dpi=300)

Extract structured data from input.

Works with or without prior optimization. If not optimized, uses the original field descriptions from the Pydantic model.

Parameters:

Name Type Description Default
text str | dict[str, str] | None

Input text (str) or dict for template formatting.

None
image_path str | Path | None

Path to image file.

None
image_base64 str | None

Base64-encoded image string.

None
pdf_path str | Path | None

Path to PDF file.

None
pdf_dpi int

DPI for PDF conversion (default: 300).

300

Returns:

Type Description
BaseModel

Pydantic model instance with extracted data.

Raises:

Type Description
ValueError

If model is not set, no input provided, or LLM not configured.

ValidationError

If extracted data doesn't match model schema.

Example

prompter = Prompter(model=User, model_id="openai/gpt-4o-mini") # doctest: +SKIP user = prompter.predict(text="John Doe, 30 years old") # doctest: +SKIP print(user.name, user.age) # doctest: +SKIP John Doe 30

Source code in src/dspydantic/prompter.py
def predict(
    self,
    text: str | dict[str, str] | None = None,
    image_path: str | Path | None = None,
    image_base64: str | None = None,
    pdf_path: str | Path | None = None,
    pdf_dpi: int = 300,
) -> BaseModel:
    """Extract structured data from input.

    Works with or without prior optimization. If not optimized, uses the
    original field descriptions from the Pydantic model.

    Args:
        text: Input text (str) or dict for template formatting.
        image_path: Path to image file.
        image_base64: Base64-encoded image string.
        pdf_path: Path to PDF file.
        pdf_dpi: DPI for PDF conversion (default: 300).

    Returns:
        Pydantic model instance with extracted data.

    Raises:
        ValueError: If model is not set, no input provided, or LLM not configured.
        ValidationError: If extracted data doesn't match model schema.

    Example:
        >>> prompter = Prompter(model=User, model_id="openai/gpt-4o-mini")  # doctest: +SKIP
        >>> user = prompter.predict(text="John Doe, 30 years old")  # doctest: +SKIP
        >>> print(user.name, user.age)  # doctest: +SKIP
        John Doe 30
    """
    if self.model is None:
        raise ValueError(
            "model is required for extraction.\n\n"
            "Provide a Pydantic model when creating the Prompter:\n"
            "    prompter = Prompter(model=MyModel, model_id='openai/gpt-4o-mini')"
        )

    self._ensure_configured()

    # Prepare input data
    text_string = text if isinstance(text, str) else None
    text_dict = text if isinstance(text, dict) else None

    try:
        input_data = prepare_input_data(
            text=text_string,
            image_path=image_path,
            image_base64=image_base64,
            pdf_path=pdf_path,
            pdf_dpi=pdf_dpi,
        )
    except ValueError as e:
        if text_dict is not None:
            input_data = {}
        else:
            raise ValueError(
                "No input provided. Provide at least one of:\n"
                "  - text: str or dict\n"
                "  - image_path: path to image file\n"
                "  - image_base64: base64-encoded image\n"
                "  - pdf_path: path to PDF file"
            ) from e

    # Get descriptions (optimized or original from model)
    descriptions = self.optimized_descriptions or extract_field_descriptions(self.model)

    # Get prompts
    system_prompt = self.optimized_system_prompt or self.system_prompt
    instruction_prompt = self.optimized_instruction_prompt or self.instruction_prompt

    # Format instruction prompt if template
    if instruction_prompt and text_dict:
        instruction_prompt = (
            format_instruction_prompt_template(instruction_prompt, text_dict)
            or instruction_prompt
        )

    # Build extraction prompt
    modified_schema = apply_optimized_descriptions(self.model, descriptions)

    prompt_parts = []
    if system_prompt:
        prompt_parts.append(f"System: {system_prompt}")
    if instruction_prompt:
        prompt_parts.append(f"Instruction: {instruction_prompt}")

    prompt_parts.append(f"\nJSON Schema:\n{json.dumps(modified_schema, indent=2)}")

    # Few-shot examples
    if self.optimized_demos:
        prompt_parts.append("\nExamples:")
        for i, d in enumerate(self.optimized_demos, 1):
            inp = d.get("input_data") or {}
            out = d.get("expected_output")
            inp_desc = format_demo_input(inp)
            out_str = json.dumps(out) if out is not None else "{}"
            prompt_parts.append(f"  Example {i}:\n    Input: {inp_desc}\n    Output: {out_str}")

    # Add input data
    if isinstance(input_data, dict):
        if "text" in input_data:
            prompt_parts.append(f"\nInput text: {input_data['text']}")
        if "images" in input_data:
            prompt_parts.append(
                f"\nInput images: {len(input_data['images'])} image(s) provided"
            )
    else:
        prompt_parts.append(f"\nInput: {str(input_data)}")

    prompt_parts.append(
        "\nExtract the structured data according to the JSON schema above "
        "and return it as valid JSON."
    )
    full_prompt = "\n\n".join(prompt_parts)
    json_prompt = f"{full_prompt}\n\nReturn only valid JSON, no other text."

    # Handle images
    images = input_data.get("images") if isinstance(input_data, dict) else None
    dspy_images = None
    if images:
        dspy_images = convert_images_to_dspy_images(images)

    # Build signature and run predictor
    signature, extractor_kwargs = build_image_signature_and_kwargs(dspy_images)
    extractor = dspy.ChainOfThought(signature)
    extractor_kwargs["prompt"] = json_prompt
    result = extractor(**extractor_kwargs)

    # Parse output
    output_text = str(result.json_output) if hasattr(result, "json_output") else str(result)

    # Try to parse JSON
    extracted_data = self._parse_json_output(output_text)

    if extracted_data is None:
        raise ValueError(
            f"Failed to extract valid JSON from LLM output.\n\n"
            f"Output received: {output_text[:300]}...\n\n"
            f"This may indicate the model struggled with the extraction task. "
            f"Try optimizing with examples to improve accuracy."
        )

    # Create optimized model and validate
    OptimizedModel = create_optimized_model(self.model, descriptions)
    return OptimizedModel.model_validate(extracted_data)

run

run(text=None, image_path=None, image_base64=None, pdf_path=None, pdf_dpi=300)

Alias for predict() - extract structured data from input.

Parameters:

Name Type Description Default
text str | dict[str, str] | None

Input text (str) or dict for template formatting.

None
image_path str | Path | None

Path to image file.

None
image_base64 str | None

Base64-encoded image string.

None
pdf_path str | Path | None

Path to PDF file.

None
pdf_dpi int

DPI for PDF conversion (default: 300).

300

Returns:

Type Description
BaseModel

Pydantic model instance with extracted data.

Source code in src/dspydantic/prompter.py
def run(
    self,
    text: str | dict[str, str] | None = None,
    image_path: str | Path | None = None,
    image_base64: str | None = None,
    pdf_path: str | Path | None = None,
    pdf_dpi: int = 300,
) -> BaseModel:
    """Alias for predict() - extract structured data from input.

    Args:
        text: Input text (str) or dict for template formatting.
        image_path: Path to image file.
        image_base64: Base64-encoded image string.
        pdf_path: Path to PDF file.
        pdf_dpi: DPI for PDF conversion (default: 300).

    Returns:
        Pydantic model instance with extracted data.
    """
    return self.predict(
        text=text,
        image_path=image_path,
        image_base64=image_base64,
        pdf_path=pdf_path,
        pdf_dpi=pdf_dpi,
    )

predict_with_confidence

predict_with_confidence(text=None, image_path=None, image_base64=None, pdf_path=None, pdf_dpi=300)

Extract structured data with confidence score.

Uses a second LLM call to assess extraction confidence based on how well the input matches the extracted fields.

Parameters:

Name Type Description Default
text str | dict[str, str] | None

Input text (str) or dict for template formatting.

None
image_path str | Path | None

Path to image file.

None
image_base64 str | None

Base64-encoded image string.

None
pdf_path str | Path | None

Path to PDF file.

None
pdf_dpi int

DPI for PDF conversion (default: 300).

300

Returns:

Type Description
ExtractionResult

ExtractionResult with data, confidence (0.0-1.0), and raw output.

Example

result = prompter.predict_with_confidence("John Doe, 30") # doctest: +SKIP print(f"{result.data.name}: {result.confidence:.0%} confident") # doctest: +SKIP John Doe: 95% confident

Source code in src/dspydantic/prompter.py
def predict_with_confidence(
    self,
    text: str | dict[str, str] | None = None,
    image_path: str | Path | None = None,
    image_base64: str | None = None,
    pdf_path: str | Path | None = None,
    pdf_dpi: int = 300,
) -> ExtractionResult:
    """Extract structured data with confidence score.

    Uses a second LLM call to assess extraction confidence based on
    how well the input matches the extracted fields.

    Args:
        text: Input text (str) or dict for template formatting.
        image_path: Path to image file.
        image_base64: Base64-encoded image string.
        pdf_path: Path to PDF file.
        pdf_dpi: DPI for PDF conversion (default: 300).

    Returns:
        ExtractionResult with data, confidence (0.0-1.0), and raw output.

    Example:
        >>> result = prompter.predict_with_confidence("John Doe, 30")  # doctest: +SKIP
        >>> print(f"{result.data.name}: {result.confidence:.0%} confident")  # doctest: +SKIP
        John Doe: 95% confident
    """
    data = self.predict(
        text=text,
        image_path=image_path,
        image_base64=image_base64,
        pdf_path=pdf_path,
        pdf_dpi=pdf_dpi,
    )

    confidence = self._assess_confidence(text, data)

    return ExtractionResult(data=data, confidence=confidence)

predict_batch

predict_batch(inputs, max_workers=4, on_error='raise')

Extract structured data from multiple inputs in parallel.

Parameters:

Name Type Description Default
inputs list[str | dict[str, str]]

List of input texts (str) or dicts for template formatting.

required
max_workers int

Maximum number of parallel workers (default: 4).

4
on_error str

Error handling strategy: - "raise": Raise first exception encountered - "return": Return exceptions in results list

'raise'

Returns:

Type Description
list[BaseModel | Exception]

List of extracted Pydantic model instances (or exceptions if on_error="return").

Example

prompter = Prompter(model=User, model_id="openai/gpt-4o-mini") # doctest: +SKIP texts = ["John Doe, 30", "Jane Smith, 25", "Bob Wilson, 40"] # doctest: +SKIP results = prompter.predict_batch(texts) # doctest: +SKIP for user in results: # doctest: +SKIP ... print(user.name, user.age) # doctest: +SKIP

Source code in src/dspydantic/prompter.py
def predict_batch(
    self,
    inputs: list[str | dict[str, str]],
    max_workers: int = 4,
    on_error: str = "raise",
) -> list[BaseModel | Exception]:
    """Extract structured data from multiple inputs in parallel.

    Args:
        inputs: List of input texts (str) or dicts for template formatting.
        max_workers: Maximum number of parallel workers (default: 4).
        on_error: Error handling strategy:
            - "raise": Raise first exception encountered
            - "return": Return exceptions in results list

    Returns:
        List of extracted Pydantic model instances (or exceptions if on_error="return").

    Example:
        >>> prompter = Prompter(model=User, model_id="openai/gpt-4o-mini")  # doctest: +SKIP
        >>> texts = ["John Doe, 30", "Jane Smith, 25", "Bob Wilson, 40"]  # doctest: +SKIP
        >>> results = prompter.predict_batch(texts)  # doctest: +SKIP
        >>> for user in results:  # doctest: +SKIP
        ...     print(user.name, user.age)  # doctest: +SKIP
    """
    results: list[BaseModel | Exception] = [None] * len(inputs)  # type: ignore

    def process_item(index: int, item: str | dict[str, str]) -> tuple[int, Any]:
        try:
            result = self.predict(text=item)
            return (index, result)
        except Exception as e:
            if on_error == "raise":
                raise
            return (index, e)

    with ThreadPoolExecutor(max_workers=max_workers) as executor:
        futures = {executor.submit(process_item, i, item): i for i, item in enumerate(inputs)}

        for future in as_completed(futures):
            index, result = future.result()
            results[index] = result

    return results

apredict async

apredict(text=None, image_path=None, image_base64=None, pdf_path=None, pdf_dpi=300)

Async version of predict() for concurrent extraction.

Parameters:

Name Type Description Default
text str | dict[str, str] | None

Input text (str) or dict for template formatting.

None
image_path str | Path | None

Path to image file.

None
image_base64 str | None

Base64-encoded image string.

None
pdf_path str | Path | None

Path to PDF file.

None
pdf_dpi int

DPI for PDF conversion (default: 300).

300

Returns:

Type Description
BaseModel

Pydantic model instance with extracted data.

Example

async def main(): # doctest: +SKIP ... prompter = Prompter(model=User, model_id="openai/gpt-4o-mini") ... user = await prompter.apredict(text="John Doe, 30") ... print(user.name)

Source code in src/dspydantic/prompter.py
async def apredict(
    self,
    text: str | dict[str, str] | None = None,
    image_path: str | Path | None = None,
    image_base64: str | None = None,
    pdf_path: str | Path | None = None,
    pdf_dpi: int = 300,
) -> BaseModel:
    """Async version of predict() for concurrent extraction.

    Args:
        text: Input text (str) or dict for template formatting.
        image_path: Path to image file.
        image_base64: Base64-encoded image string.
        pdf_path: Path to PDF file.
        pdf_dpi: DPI for PDF conversion (default: 300).

    Returns:
        Pydantic model instance with extracted data.

    Example:
        >>> async def main():  # doctest: +SKIP
        ...     prompter = Prompter(model=User, model_id="openai/gpt-4o-mini")
        ...     user = await prompter.apredict(text="John Doe, 30")
        ...     print(user.name)
    """
    loop = asyncio.get_running_loop()
    return await loop.run_in_executor(
        None,
        lambda: self.predict(
            text=text,
            image_path=image_path,
            image_base64=image_base64,
            pdf_path=pdf_path,
            pdf_dpi=pdf_dpi,
        ),
    )

apredict_batch async

apredict_batch(inputs, max_concurrency=4, on_error='raise')

Async batch extraction with controlled concurrency.

Parameters:

Name Type Description Default
inputs list[str | dict[str, str]]

List of input texts (str) or dicts for template formatting.

required
max_concurrency int

Maximum concurrent requests (default: 4).

4
on_error str

Error handling strategy ("raise" or "return").

'raise'

Returns:

Type Description
list[BaseModel | Exception]

List of extracted Pydantic model instances.

Example

async def main(): # doctest: +SKIP ... prompter = Prompter(model=User, model_id="openai/gpt-4o-mini") ... texts = ["John Doe, 30", "Jane Smith, 25"] ... results = await prompter.apredict_batch(texts)

Source code in src/dspydantic/prompter.py
async def apredict_batch(
    self,
    inputs: list[str | dict[str, str]],
    max_concurrency: int = 4,
    on_error: str = "raise",
) -> list[BaseModel | Exception]:
    """Async batch extraction with controlled concurrency.

    Args:
        inputs: List of input texts (str) or dicts for template formatting.
        max_concurrency: Maximum concurrent requests (default: 4).
        on_error: Error handling strategy ("raise" or "return").

    Returns:
        List of extracted Pydantic model instances.

    Example:
        >>> async def main():  # doctest: +SKIP
        ...     prompter = Prompter(model=User, model_id="openai/gpt-4o-mini")
        ...     texts = ["John Doe, 30", "Jane Smith, 25"]
        ...     results = await prompter.apredict_batch(texts)
    """
    semaphore = asyncio.Semaphore(max_concurrency)

    async def process_with_semaphore(index: int, item: str | dict[str, str]) -> tuple[int, Any]:
        async with semaphore:
            try:
                result = await self.apredict(text=item)
                return (index, result)
            except Exception as e:
                if on_error == "raise":
                    raise
                return (index, e)

    tasks = [process_with_semaphore(i, item) for i, item in enumerate(inputs)]
    completed = await asyncio.gather(*tasks)

    results: list[BaseModel | Exception] = [None] * len(inputs)  # type: ignore
    for item in completed:
        if isinstance(item, tuple):
            index, result = item
            results[index] = result

    return results

save

save(save_path)

Save Prompter state to disk.

Parameters:

Name Type Description Default
save_path str | Path

Path to save directory (will be created if doesn't exist).

required

Raises:

Type Description
ValueError

If model is not set or not optimized.

PersistenceError

If save fails.

Source code in src/dspydantic/prompter.py
def save(self, save_path: str | Path) -> None:
    """Save Prompter state to disk.

    Args:
        save_path: Path to save directory (will be created if doesn't exist).

    Raises:
        ValueError: If model is not set or not optimized.
        PersistenceError: If save fails.
    """
    if self.model is None:
        raise ValueError("model is required for saving")

    if not self.optimized_descriptions:
        raise ValueError("Prompter must be optimized before saving. Call optimize() first.")

    # Get model schema
    model_schema = self.model.model_json_schema()

    # Create state (model configuration not saved - user must configure DSPy separately)
    state = PrompterState(
        model_schema=model_schema,
        optimized_descriptions=self.optimized_descriptions,
        optimized_system_prompt=self.optimized_system_prompt,
        optimized_instruction_prompt=self.optimized_instruction_prompt,
        model_id="",  # Not used anymore, kept for backward compatibility
        model_config={},  # Not used anymore, kept for backward compatibility
        version=__version__,
        metadata={},
        optimized_demos=self.optimized_demos,
    )

    # Save
    save_prompter_state(state, save_path)

load classmethod

load(load_path, model=None, model_id=None, api_key=None, cache=False)

Load Prompter from disk.

Parameters:

Name Type Description Default
load_path str | Path

Path to saved prompter directory.

required
model type[BaseModel] | None

Optional Pydantic model class. If provided, will be used for extraction.

None
model_id str | None

LiteLLM model identifier. If provided, auto-configures DSPy.

None
api_key str | None

API key for the model provider.

None
cache bool | str

Enable caching. True uses default ".dspydantic_cache", or provide path.

False

Returns:

Type Description
Prompter

Loaded Prompter instance.

Raises:

Type Description
PersistenceError

If load fails or version is incompatible.

Example

prompter = Prompter.load("./my_prompter", model=User, model_id="openai/gpt-4o-mini") # doctest: +SKIP result = prompter.run("John Doe, 30") # doctest: +SKIP

Source code in src/dspydantic/prompter.py
@classmethod
def load(
    cls,
    load_path: str | Path,
    model: type[BaseModel] | None = None,
    model_id: str | None = None,
    api_key: str | None = None,
    cache: bool | str = False,
) -> Prompter:
    """Load Prompter from disk.

    Args:
        load_path: Path to saved prompter directory.
        model: Optional Pydantic model class. If provided, will be used for extraction.
        model_id: LiteLLM model identifier. If provided, auto-configures DSPy.
        api_key: API key for the model provider.
        cache: Enable caching. True uses default ".dspydantic_cache", or provide path.

    Returns:
        Loaded Prompter instance.

    Raises:
        PersistenceError: If load fails or version is incompatible.

    Example:
        >>> prompter = Prompter.load("./my_prompter", model=User, model_id="openai/gpt-4o-mini")  # doctest: +SKIP
        >>> result = prompter.run("John Doe, 30")  # doctest: +SKIP
    """
    state = load_prompter_state(load_path)

    prompter = cls(
        model=model,
        model_id=model_id,
        api_key=api_key,
        cache=cache,
        optimized_descriptions=state.optimized_descriptions,
        optimized_system_prompt=state.optimized_system_prompt,
        optimized_instruction_prompt=state.optimized_instruction_prompt,
        optimized_demos=getattr(state, "optimized_demos", None),
    )

    # Store schema for reference
    prompter._saved_schema = state.model_schema

    return prompter

from_optimization_result classmethod

from_optimization_result(model, optimization_result)

Create Prompter from OptimizationResult.

Useful for converting existing PydanticOptimizer results to Prompter.

Parameters:

Name Type Description Default
model type[BaseModel]

Pydantic model class.

required
optimization_result OptimizationResult

Result from PydanticOptimizer.optimize().

required

Returns:

Type Description
Prompter

Prompter instance with optimized state.

Note

DSPy must be configured with dspy.configure(lm=dspy.LM(...)) before using the returned prompter.

Source code in src/dspydantic/prompter.py
@classmethod
def from_optimization_result(
    cls,
    model: type[BaseModel],
    optimization_result: OptimizationResult,
) -> Prompter:
    """Create Prompter from OptimizationResult.

    Useful for converting existing PydanticOptimizer results to Prompter.

    Args:
        model: Pydantic model class.
        optimization_result: Result from PydanticOptimizer.optimize().

    Returns:
        Prompter instance with optimized state.

    Note:
        DSPy must be configured with `dspy.configure(lm=dspy.LM(...))` before using
        the returned prompter.
    """
    return cls(
        model=model,
        optimized_descriptions=optimization_result.optimized_descriptions,
        optimized_system_prompt=optimization_result.optimized_system_prompt,
        optimized_instruction_prompt=optimization_result.optimized_instruction_prompt,
        optimized_demos=optimization_result.optimized_demos,
    )

Overview

The Prompter class combines optimization and extraction functionality in a single interface. Use it to optimize field descriptions and prompts, then extract structured data from text, images, or PDFs.

Basic Usage

from dspydantic import Prompter, Example
from pydantic import BaseModel, Field

class User(BaseModel):
    name: str = Field(description="User name")
    age: int = Field(description="User age")

# Simple setup with model_id (auto-configures DSPy)
prompter = Prompter(model=User, model_id="openai/gpt-4o-mini")

# Extract directly (no optimization required)
data = prompter.run("Jane Smith, 25")

# Or with optimization for better accuracy
result = prompter.optimize(
    examples=[Example(text="John Doe, 30", expected_output={"name": "John Doe", "age": 30})]
)
data = prompter.run("Jane Smith, 25")

# Save and load
prompter.save("./my_prompter")
prompter = Prompter.load("./my_prompter", model=User, model_id="openai/gpt-4o-mini")

Production Features

# Enable caching to reduce API costs
prompter = Prompter(model=User, model_id="openai/gpt-4o-mini", cache=True)

# Batch extraction (parallel)
texts = ["Alice, 25", "Bob, 30", "Carol, 35"]
users = prompter.predict_batch(texts, max_workers=4)

# Async extraction
user = await prompter.apredict("John Doe, 30")

# Extraction with confidence score
result = prompter.predict_with_confidence("John Doe, 30")
print(f"Confidence: {result.confidence:.0%}")

# Multi-modal extraction
data = prompter.run(image_path="photo.png")
data = prompter.run(pdf_path="document.pdf")

See Also