Prompter¶
Unified class for optimizing and extracting with Pydantic models.
Prompter ¶
Prompter(model=None, model_id=None, api_key=None, cache=False, system_prompt=None, instruction_prompt=None, optimized_descriptions=None, optimized_system_prompt=None, optimized_instruction_prompt=None, optimized_demos=None)
Unified class for optimizing and extracting with Pydantic models.
This class combines optimization and extraction functionality in a single interface. It wraps PydanticOptimizer and adds extraction capabilities along with save/load.
Examples:
Simple usage with model_id (recommended):
from dspydantic import Prompter
from pydantic import BaseModel, Field
class User(BaseModel):
name: str = Field(description="User name")
age: int = Field(description="User age")
# Create prompter with model_id - auto-configures DSPy
prompter = Prompter(model=User, model_id="openai/gpt-4o-mini")
# Extract directly (no optimization required)
data = prompter.run("John Doe, 30 years old")
print(data.name, data.age) # John Doe 30
With optimization:
examples = [
Example(text="John Doe, 30", expected_output={"name": "John Doe", "age": 30})
]
result = prompter.optimize(examples=examples)
# Extract with optimized prompts
data = prompter.run("Jane Smith, 25")
Manual DSPy configuration:
import dspy
lm = dspy.LM("openai/gpt-4o", api_key="your-key")
dspy.configure(lm=lm)
prompter = Prompter(model=User) # Uses existing DSPy config
Initialize Prompter.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
type[BaseModel] | None
|
Pydantic model class for extraction schema. |
None
|
model_id
|
str | None
|
LiteLLM model identifier (e.g., "openai/gpt-4o-mini", "anthropic/claude-3-sonnet"). If provided, automatically configures DSPy. Supports all models via LiteLLM. |
None
|
api_key
|
str | None
|
API key for the model provider. If None, uses environment variable (OPENAI_API_KEY, ANTHROPIC_API_KEY, etc.). |
None
|
cache
|
bool | str
|
Enable caching. True uses default ".dspydantic_cache", or provide path string. |
False
|
system_prompt
|
str | None
|
Initial system prompt for extraction. |
None
|
instruction_prompt
|
str | None
|
Initial instruction prompt for extraction. |
None
|
optimized_descriptions
|
dict[str, str] | None
|
Pre-optimized field descriptions (for loading). |
None
|
optimized_system_prompt
|
str | None
|
Pre-optimized system prompt (for loading). |
None
|
optimized_instruction_prompt
|
str | None
|
Pre-optimized instruction prompt (for loading). |
None
|
optimized_demos
|
list[dict[str, Any]] | None
|
Pre-optimized few-shot examples (for loading). |
None
|
Example
prompter = Prompter(model=User, model_id="openai/gpt-4o-mini") # doctest: +SKIP data = prompter.run("John Doe, 30") # doctest: +SKIP
Source code in src/dspydantic/prompter.py
Functions¶
optimize ¶
optimize(examples, evaluate_fn=None, optimizer=None, train_split=0.8, num_threads=4, verbose=False, exclude_fields=None, include_fields=None, evaluator_config=None, sequential=False, parallel_fields=True, max_val_examples=None, skip_score_threshold=None, on_progress=None, **kwargs)
Optimize prompts and field descriptions.
Uses PydanticOptimizer internally to perform optimization.
If model is None and examples have string expected_output values, a model with a single "output" field will be automatically created.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
examples
|
list[Example]
|
List of examples for optimization. |
required |
evaluate_fn
|
Callable[[Example, dict[str, str], str | None, str | None], float] | Callable[[Example, dict[str, Any], dict[str, str], str | None, str | None], float] | LM | str | None
|
Evaluation function or string metric. |
None
|
optimizer
|
str | Any | None
|
Optimizer name or instance (auto-selects if None). |
None
|
train_split
|
float
|
Training split fraction (default: 0.8). |
0.8
|
num_threads
|
int
|
Number of threads (default: 4). |
4
|
verbose
|
bool
|
Print progress (default: False). |
False
|
exclude_fields
|
list[str] | None
|
Field names to exclude from evaluation. |
None
|
include_fields
|
list[str] | None
|
Field names to include (only these are optimized/scored). |
None
|
evaluator_config
|
dict[str, Any] | None
|
Evaluator configuration dict. |
None
|
sequential
|
bool
|
If False (default), use single-pass optimization (all fields together). If True, optimize each field independently (deepest-first). |
False
|
parallel_fields
|
bool
|
If True (default), parallelize field optimization when sequential=True. Has no effect when sequential=False. |
True
|
max_val_examples
|
int | None
|
Optional cap on validation set size per field. |
None
|
skip_score_threshold
|
float | None
|
Optional threshold to skip high-scoring fields (sequential mode only). |
None
|
on_progress
|
Callable[[FieldOptimizationProgress], None] | None
|
Optional callback to receive FieldOptimizationProgress updates. Called automatically when verbose=True. |
None
|
**kwargs
|
Any
|
Additional kwargs passed to PydanticOptimizer. |
{}
|
Returns:
| Type | Description |
|---|---|
OptimizationResult
|
OptimizationResult with optimized descriptions and prompts. |
Source code in src/dspydantic/prompter.py
254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 | |
predict ¶
Extract structured data from input.
Works with or without prior optimization. If not optimized, uses the original field descriptions from the Pydantic model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str | dict[str, str] | None
|
Input text (str) or dict for template formatting. |
None
|
image_path
|
str | Path | None
|
Path to image file. |
None
|
image_base64
|
str | None
|
Base64-encoded image string. |
None
|
pdf_path
|
str | Path | None
|
Path to PDF file. |
None
|
pdf_dpi
|
int
|
DPI for PDF conversion (default: 300). |
300
|
Returns:
| Type | Description |
|---|---|
BaseModel
|
Pydantic model instance with extracted data. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If model is not set, no input provided, or LLM not configured. |
ValidationError
|
If extracted data doesn't match model schema. |
Example
prompter = Prompter(model=User, model_id="openai/gpt-4o-mini") # doctest: +SKIP user = prompter.predict(text="John Doe, 30 years old") # doctest: +SKIP print(user.name, user.age) # doctest: +SKIP John Doe 30
Source code in src/dspydantic/prompter.py
365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 | |
run ¶
Alias for predict() - extract structured data from input.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str | dict[str, str] | None
|
Input text (str) or dict for template formatting. |
None
|
image_path
|
str | Path | None
|
Path to image file. |
None
|
image_base64
|
str | None
|
Base64-encoded image string. |
None
|
pdf_path
|
str | Path | None
|
Path to PDF file. |
None
|
pdf_dpi
|
int
|
DPI for PDF conversion (default: 300). |
300
|
Returns:
| Type | Description |
|---|---|
BaseModel
|
Pydantic model instance with extracted data. |
Source code in src/dspydantic/prompter.py
predict_with_confidence ¶
Extract structured data with confidence score.
Uses a second LLM call to assess extraction confidence based on how well the input matches the extracted fields.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str | dict[str, str] | None
|
Input text (str) or dict for template formatting. |
None
|
image_path
|
str | Path | None
|
Path to image file. |
None
|
image_base64
|
str | None
|
Base64-encoded image string. |
None
|
pdf_path
|
str | Path | None
|
Path to PDF file. |
None
|
pdf_dpi
|
int
|
DPI for PDF conversion (default: 300). |
300
|
Returns:
| Type | Description |
|---|---|
ExtractionResult
|
ExtractionResult with data, confidence (0.0-1.0), and raw output. |
Example
result = prompter.predict_with_confidence("John Doe, 30") # doctest: +SKIP print(f"{result.data.name}: {result.confidence:.0%} confident") # doctest: +SKIP John Doe: 95% confident
Source code in src/dspydantic/prompter.py
predict_batch ¶
Extract structured data from multiple inputs in parallel.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
inputs
|
list[str | dict[str, str]]
|
List of input texts (str) or dicts for template formatting. |
required |
max_workers
|
int
|
Maximum number of parallel workers (default: 4). |
4
|
on_error
|
str
|
Error handling strategy: - "raise": Raise first exception encountered - "return": Return exceptions in results list |
'raise'
|
Returns:
| Type | Description |
|---|---|
list[BaseModel | Exception]
|
List of extracted Pydantic model instances (or exceptions if on_error="return"). |
Example
prompter = Prompter(model=User, model_id="openai/gpt-4o-mini") # doctest: +SKIP texts = ["John Doe, 30", "Jane Smith, 25", "Bob Wilson, 40"] # doctest: +SKIP results = prompter.predict_batch(texts) # doctest: +SKIP for user in results: # doctest: +SKIP ... print(user.name, user.age) # doctest: +SKIP
Source code in src/dspydantic/prompter.py
apredict
async
¶
Async version of predict() for concurrent extraction.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str | dict[str, str] | None
|
Input text (str) or dict for template formatting. |
None
|
image_path
|
str | Path | None
|
Path to image file. |
None
|
image_base64
|
str | None
|
Base64-encoded image string. |
None
|
pdf_path
|
str | Path | None
|
Path to PDF file. |
None
|
pdf_dpi
|
int
|
DPI for PDF conversion (default: 300). |
300
|
Returns:
| Type | Description |
|---|---|
BaseModel
|
Pydantic model instance with extracted data. |
Example
async def main(): # doctest: +SKIP ... prompter = Prompter(model=User, model_id="openai/gpt-4o-mini") ... user = await prompter.apredict(text="John Doe, 30") ... print(user.name)
Source code in src/dspydantic/prompter.py
apredict_batch
async
¶
Async batch extraction with controlled concurrency.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
inputs
|
list[str | dict[str, str]]
|
List of input texts (str) or dicts for template formatting. |
required |
max_concurrency
|
int
|
Maximum concurrent requests (default: 4). |
4
|
on_error
|
str
|
Error handling strategy ("raise" or "return"). |
'raise'
|
Returns:
| Type | Description |
|---|---|
list[BaseModel | Exception]
|
List of extracted Pydantic model instances. |
Example
async def main(): # doctest: +SKIP ... prompter = Prompter(model=User, model_id="openai/gpt-4o-mini") ... texts = ["John Doe, 30", "Jane Smith, 25"] ... results = await prompter.apredict_batch(texts)
Source code in src/dspydantic/prompter.py
save ¶
Save Prompter state to disk.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
save_path
|
str | Path
|
Path to save directory (will be created if doesn't exist). |
required |
Raises:
| Type | Description |
|---|---|
ValueError
|
If model is not set or not optimized. |
PersistenceError
|
If save fails. |
Source code in src/dspydantic/prompter.py
load
classmethod
¶
Load Prompter from disk.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
load_path
|
str | Path
|
Path to saved prompter directory. |
required |
model
|
type[BaseModel] | None
|
Optional Pydantic model class. If provided, will be used for extraction. |
None
|
model_id
|
str | None
|
LiteLLM model identifier. If provided, auto-configures DSPy. |
None
|
api_key
|
str | None
|
API key for the model provider. |
None
|
cache
|
bool | str
|
Enable caching. True uses default ".dspydantic_cache", or provide path. |
False
|
Returns:
| Type | Description |
|---|---|
Prompter
|
Loaded Prompter instance. |
Raises:
| Type | Description |
|---|---|
PersistenceError
|
If load fails or version is incompatible. |
Example
prompter = Prompter.load("./my_prompter", model=User, model_id="openai/gpt-4o-mini") # doctest: +SKIP result = prompter.run("John Doe, 30") # doctest: +SKIP
Source code in src/dspydantic/prompter.py
from_optimization_result
classmethod
¶
Create Prompter from OptimizationResult.
Useful for converting existing PydanticOptimizer results to Prompter.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
type[BaseModel]
|
Pydantic model class. |
required |
optimization_result
|
OptimizationResult
|
Result from PydanticOptimizer.optimize(). |
required |
Returns:
| Type | Description |
|---|---|
Prompter
|
Prompter instance with optimized state. |
Note
DSPy must be configured with dspy.configure(lm=dspy.LM(...)) before using
the returned prompter.
Source code in src/dspydantic/prompter.py
Overview¶
The Prompter class combines optimization and extraction functionality in a single interface. Use it to optimize field descriptions and prompts, then extract structured data from text, images, or PDFs.
Basic Usage¶
from dspydantic import Prompter, Example
from pydantic import BaseModel, Field
class User(BaseModel):
name: str = Field(description="User name")
age: int = Field(description="User age")
# Simple setup with model_id (auto-configures DSPy)
prompter = Prompter(model=User, model_id="openai/gpt-4o-mini")
# Extract directly (no optimization required)
data = prompter.run("Jane Smith, 25")
# Or with optimization for better accuracy
result = prompter.optimize(
examples=[Example(text="John Doe, 30", expected_output={"name": "John Doe", "age": 30})]
)
data = prompter.run("Jane Smith, 25")
# Save and load
prompter.save("./my_prompter")
prompter = Prompter.load("./my_prompter", model=User, model_id="openai/gpt-4o-mini")
Production Features¶
# Enable caching to reduce API costs
prompter = Prompter(model=User, model_id="openai/gpt-4o-mini", cache=True)
# Batch extraction (parallel)
texts = ["Alice, 25", "Bob, 30", "Carol, 35"]
users = prompter.predict_batch(texts, max_workers=4)
# Async extraction
user = await prompter.apredict("John Doe, 30")
# Extraction with confidence score
result = prompter.predict_with_confidence("John Doe, 30")
print(f"Confidence: {result.confidence:.0%}")
# Multi-modal extraction
data = prompter.run(image_path="photo.png")
data = prompter.run(pdf_path="document.pdf")