Configure Optimizations¶
This guide covers how to configure optimization parameters, choose the right DSPy optimizer, and understand API call costs.
Quick Reference¶
| Factor | Default | Recommended | Impact |
|---|---|---|---|
| Examples | - | 10-20 | Quality |
| Threads | 4 | 4-8 | Speed |
| Optimizer | Auto | Based on dataset | Quality/Cost |
| sequential | False | False for speed, True for quality | Single-pass vs field-by-field |
| parallel_fields | True | True for sequential mode | Parallelizes field optimization |
| max_val_examples | None | 3-5 to reduce API calls | Validation set size |
| skip_score_threshold | None | 0.95 for high-scoring fields | Skip well-optimized fields |
| early_stopping_patience | None | 2-3 in sequential mode | Stop when no improvement |
| auto_generate_prompts | False | True for quick start | Auto-create system/instruction prompts |
| optimizer_kwargs | None | {"auto": None, "num_candidates": 3} |
Extra kwargs for optimizer constructor |
| compile_kwargs | None | {"num_trials": 5} for testing |
Extra kwargs for DSPy compile |
| include_fields | None | As needed | Focus optimization |
| exclude_fields | None | As needed | Skip metadata in scoring |
Single-Pass vs Sequential Mode Optimization¶
By default, DSPydantic uses single-pass mode (sequential=False):
- All field descriptions and prompts are optimized together in one DSPy compile
- Reduced demo budgets (
max_bootstrapped_demos=1) for speed - Fastest approach: one DSPy compile instead of N+2
- Good when speed is prioritized
Use sequential=True for field-by-field optimization — slower but better quality:
- Phase 1: Optimize each field description independently, deepest-nested first. Each run has a minimal search space.
- Phase 2: Optimize system and instruction prompts with field descriptions fixed.
- With
parallel_fields=True(default), all fields optimize simultaneously, giving ~N× speedup
# Single-pass (default): fast, lower API costs
result = prompter.optimize(examples=examples)
# Sequential: field-by-field for better quality
result = prompter.optimize(examples=examples, sequential=True)
# Sequential + parallel: best of both (field-by-field quality with parallel speedup)
result = prompter.optimize(examples=examples, sequential=True, parallel_fields=True)
# Reduce API calls by capping validation examples
result = prompter.optimize(examples=examples, max_val_examples=5)
Field Inclusion and Exclusion¶
Restrict which fields are optimized and scored:
# Only optimize specific fields (reduces time and API costs)
result = prompter.optimize(
examples=examples,
include_fields=["address", "total"],
)
# Exclude fields from scoring (still extracted)
result = prompter.optimize(
examples=examples,
exclude_fields=["metadata", "timestamp"],
)
See Field Inclusion & Exclusion for details.
Number of Examples¶
| Examples | Speed | Quality | API Calls |
|---|---|---|---|
| 5-10 | Fast | Good | ~50-100 |
| 10-20 | Medium | Better | ~100-200 |
| 20+ | Slower | Best | ~200-500+ |
Tips:
- Start with 5-10 for prototyping
- Use 10-20 for production
- Ensure diverse examples covering edge cases
DSPy Optimizers¶
DSPydantic uses DSPy optimizers under the hood. Understanding them helps you choose the right one.
Auto-Selection Logic¶
DSPydantic auto-selects based on dataset size:
| Examples | Auto-Selected Optimizer |
|---|---|
| 1-2 | MIPROv2 (zero-shot mode) |
| 3-19 | BootstrapFewShot |
| 20+ | BootstrapFewShotWithRandomSearch |
Optimizer Comparison¶
| Optimizer | Speed | Quality | API Calls | Best For |
|---|---|---|---|---|
| BootstrapFewShot | Fast | Good | ~N | Prototyping, small datasets |
| BootstrapFewShotWithRandomSearch | Medium | Better | ~N×10 | Production, reliable results |
| MIPROv2 (light) | Medium | Better | ~50 | Quick production |
| MIPROv2 (medium) | Slow | Best | ~200 | Balanced quality/cost |
| MIPROv2 (heavy) | Slowest | Best | ~500+ | Maximum quality |
| COPRO | Medium | Good | ~M×K | Debugging, understanding prompts |
| GEPA | Medium | Good | ~20-100 | Complex reasoning, interpretable |
| SIMBA | Medium | Better | Variable | Large datasets (500+), batch |
| BetterTogether | Slowest | Best | Sum of all | Maximum quality, production |
| Ensemble | - | Best | N per input | Reliability, variance reduction |
| BootstrapFinetune | Slow | Best | Variable | Permanent model improvements |
BootstrapFewShot¶
Purpose: Simple few-shot learning by sampling demonstrations from successful traces.
Best for: Small datasets (10-50 examples), quick prototyping.
How it works:
- Runs your program on training examples
- Collects traces of successful executions (based on metric)
- Selects best demonstrations to include in prompts
API calls: ~N calls for N examples
from dspy.teleprompt import BootstrapFewShot
optimizer = BootstrapFewShot(max_bootstrapped_demos=4)
result = prompter.optimize(examples=examples, optimizer=optimizer)
BootstrapFewShotWithRandomSearch¶
Purpose: BootstrapFewShot with multiple random seeds to find better demonstrations.
Best for: Medium datasets (50-200 examples), reliable results.
How it works:
- Runs BootstrapFewShot multiple times with different seeds
- Evaluates each configuration on validation set
- Returns best configuration
API calls: ~N × num_candidate_programs
from dspy.teleprompt import BootstrapFewShotWithRandomSearch
optimizer = BootstrapFewShotWithRandomSearch(
max_bootstrapped_demos=4,
num_candidate_programs=10, # More = more calls, better results
)
result = prompter.optimize(examples=examples, optimizer=optimizer)
MIPROv2 (Production Recommended)¶
Purpose: Multi-step instruction and prompt optimization. The most sophisticated general optimizer.
Best for: Production optimization (50-500+ examples), maximum quality.
How it works:
- Bootstrapping stage: Collects traces from running program
- Grounded proposal stage: Uses LM to propose better instructions
- Discrete search stage: Bayesian optimization to find best combination
API calls:
| Mode | Calls | When to Use |
|---|---|---|
light |
~50 | Quick optimization |
medium |
~200 | Balanced |
heavy |
~500+ | Maximum quality |
from dspy.teleprompt import MIPROv2
# Light mode - faster, fewer calls
optimizer = MIPROv2(auto="light", num_threads=8)
# Medium mode - balanced
optimizer = MIPROv2(auto="medium", num_threads=8)
# Heavy mode - best quality
optimizer = MIPROv2(auto="heavy", num_threads=8)
result = prompter.optimize(examples=examples, optimizer=optimizer)
COPRO (Coordinate Descent)¶
Purpose: Optimizes prompts by coordinate descent - changing one aspect at a time.
Best for: Understanding which prompt components matter most, debugging.
How it works:
- Starts with initial prompt
- Optimizes each "coordinate" (instruction, format) independently
- Combines best settings
API calls: ~M × K (M = coordinates, K = options each)
from dspy.teleprompt import COPRO
optimizer = COPRO(verbose=True)
result = prompter.optimize(examples=examples, optimizer=optimizer)
GEPA (Reflective Prompt Evolution)¶
Purpose: Generative Evolution of Prompts and Adaptations. Iteratively refines prompts through self-reflection.
Best for: Complex reasoning tasks, interpretable improvements.
How it works:
- Evaluates current prompt on examples
- Reflects on failures and successes
- Proposes prompt modifications
- Iterates until convergence
API calls: ~20-100 depending on iterations
from dspy.teleprompt import GEPA
optimizer = GEPA(num_iterations=10, verbose=True)
result = prompter.optimize(examples=examples, optimizer=optimizer)
SIMBA¶
Purpose: Scalable Instruction Meta-prompting for Batch Adaptation.
Best for: Large datasets (500+ examples), batch processing.
How it works:
- Creates meta-prompts that teach the LM about the task
- Scales well to large datasets
- Focuses on instruction quality
API calls: Variable, scales with dataset
from dspy.teleprompt import SIMBA
optimizer = SIMBA()
result = prompter.optimize(examples=examples, optimizer=optimizer)
BetterTogether¶
Purpose: Combines multiple optimizers for best results.
Best for: Maximum quality needed, production deployments.
How it works:
- Runs multiple optimizers in sequence or parallel
- Uses results from one optimizer to inform the next
- Returns best combined result
API calls: Sum of all component optimizer calls
from dspy.teleprompt import BetterTogether, BootstrapFewShot, MIPROv2
optimizer = BetterTogether(
optimizers=[
BootstrapFewShot(max_bootstrapped_demos=4),
MIPROv2(auto="light"),
]
)
result = prompter.optimize(examples=examples, optimizer=optimizer)
Ensemble¶
Purpose: Combines multiple optimized programs at inference time.
Best for: Maximum reliability, reducing variance, production systems.
How it works:
- Runs input through multiple programs
- Aggregates outputs (voting, averaging, etc.)
- Returns consensus result
API calls at inference: N calls per input (one per ensemble member)
from dspy import Ensemble
# After training multiple programs
ensemble = Ensemble(programs=[prog1, prog2, prog3], method="majority_vote")
result = ensemble(input_data)
BootstrapFinetune¶
Purpose: Finetune model weights instead of just prompts.
Best for: 100+ high-quality examples, permanent model improvements.
How it works:
- Generates training data from traces
- Finetunes underlying LM on that data
- Uses finetuned model for inference
API calls: Depends on training data size + finetuning
from dspy.teleprompt import BootstrapFinetune
train_kwargs = {
"use_peft": True, # Enable LoRA
"num_train_epochs": 1,
"per_device_train_batch_size": 4,
"learning_rate": 2e-4,
}
optimizer = BootstrapFinetune(train_kwargs=train_kwargs, num_threads=8)
result = prompter.optimize(examples=examples, optimizer=optimizer)
Requirements for LoRA: pip install transformers accelerate trl peft
Parallel Evaluation¶
Use multiple threads for faster optimization:
| Threads | Speed | Use Case |
|---|---|---|
| 1 | Baseline | Debugging |
| 2-4 | 2-3x faster | Development |
| 4-8 | 3-4x faster | Production |
API Call Tracking¶
After optimization, check usage:
result = prompter.optimize(examples=examples)
print(f"API calls: {result.api_calls}")
print(f"Tokens used: {result.total_tokens:,}")
print(f"Baseline: {result.baseline_score:.0%}")
print(f"Optimized: {result.optimized_score:.0%}")
Common API Call Issues¶
Issue 1: Hidden Calls in Metrics¶
Metrics that use LLMs add calls per evaluation:
# BAD: This makes an LM call per evaluation!
def my_metric(example, pred, trace=None):
judge = dspy.ChainOfThought(JudgeSignature)
return judge(pred.output).score # Hidden call!
# GOOD: Use simple comparison metrics when possible
def my_metric(example, pred, trace=None):
return pred.output == example.expected_output
Issue 2: Uncached Repeated Calls¶
Same inputs without caching = repeated API calls:
# Enable caching to avoid duplicate calls
prompter = Prompter(
model=MyModel,
model_id="openai/gpt-4o-mini",
cache=True, # Prevents duplicate API calls
)
Issue 3: Optimizer Training Calls¶
Optimizers make many calls during compilation:
# MIPROv2 medium makes ~200 calls
optimizer = MIPROv2(auto="medium")
# Start with light for testing (~50 calls)
optimizer = MIPROv2(auto="light")
Reducing API Costs¶
- Start small: Use 5-10 examples initially
- Use caching:
cache=Trueprevents duplicate calls - Choose optimizer wisely: BootstrapFewShot for prototyping
- Use cheaper models:
gpt-4o-minifor optimization,gpt-4ofor production - Start with light mode:
MIPROv2(auto="light")before"heavy"
prompter = Prompter(
model=MyModel,
model_id="openai/gpt-4o-mini", # Cheaper model for optimization
cache=True,
)
# Start light
result = prompter.optimize(
examples=examples[:10], # Fewer examples first
optimizer=MIPROv2(auto="light"),
)
# If results are good, try more examples
result = prompter.optimize(
examples=examples,
optimizer=MIPROv2(auto="medium"),
)
Early Stopping¶
In sequential mode, stop optimizing when scores plateau:
result = prompter.optimize(
examples=examples,
sequential=True,
early_stopping_patience=2, # Stop after 2 fields without improvement
)
Fields are optimized deepest-first. If early_stopping_patience consecutive fields show no improvement, the remaining fields are skipped. This can significantly reduce API costs when most fields already have good descriptions.
Auto-Generate Prompts¶
Automatically create system and instruction prompts from your model:
This generates sensible defaults based on your model name and field names:
- System prompt: "You are an expert at extracting structured {ModelName} data from text. Be precise and faithful to the source text."
- Instruction prompt: "Extract the following fields from the given text: field1, field2, .... Return only values that are explicitly stated or clearly implied."
Existing prompts are preserved — auto-generation only fills in None values. The generated prompts are then optimized alongside field descriptions.
Contextual Optimization¶
DSPydantic automatically creates model-aware optimization signatures that give DSPy optimizers (especially MiPROv2) domain context without bloating token usage:
- Dynamic class name: The optimizer sees
OptimizeMedicalRecordFieldDescriptioninstead of a generic name — zero extra tokens - Field name input: The optimizer knows which field it's improving (e.g.,
patient_name) — ~2-5 extra tokens per call - Concise docstring: Includes the model name for domain context — ~25 tokens
This prevents MiPROv2's proposer from generating generic meta-instructions like "Given the fields field_description, produce optimized_field_description" and instead produces actual improved field descriptions.
Tie-Breaking: Prefer Simplicity¶
When multiple optimization candidates achieve the same score, DSPydantic prefers the shorter (simpler) option. This applies to field descriptions, system prompts, and instruction prompts. Shorter descriptions save tokens at inference time across every future extraction call.
Skip Optimization Phases¶
Skip specific optimization phases to keep certain parts fixed:
# Only optimize prompts, keep field descriptions as-is
result = prompter.optimize(
examples=examples,
skip_field_description_optimization=True,
)
# Only optimize field descriptions, skip prompt optimization
result = prompter.optimize(
examples=examples,
skip_system_prompt_optimization=True,
skip_instruction_prompt_optimization=True,
)
Custom Optimizer and Compile Arguments¶
Pass additional arguments to DSPy optimizers at construction time:
result = prompter.optimize(
examples=examples,
optimizer="miprov2",
optimizer_kwargs={
"max_bootstrapped_demos": 8,
"auto": "medium",
"num_threads": 8,
},
)
Pass extra arguments to the DSPy compile() call:
# Limit MiPROv2 trials for faster iteration
result = prompter.optimize(
examples=examples,
compile_kwargs={"num_trials": 5, "minibatch": False},
)
This is particularly useful for controlling MiPROv2's trial count during testing or development.
Progress Tracking and Verbose Output¶
Monitor optimization progress in real-time with rich-formatted output showing optimized values:
Automatic Verbose Output¶
Enable verbose mode to see real-time progress with automatically formatted output:
result = prompter.optimize(
examples=examples,
verbose=True, # Enables rich formatted output with optimized values
)
Output shows: - Header: Model name, field count, examples, optimization mode - Progress: Field-by-field scores with improved/unchanged indicators - Optimized Values: The actual optimized descriptions after each field - Summary Table: Final scores, improvements, API calls, tokens
Custom Progress Callbacks¶
For custom progress handling, use the on_progress callback:
from dspydantic import FieldOptimizationProgress
def my_callback(progress: FieldOptimizationProgress):
if progress.phase == "fields":
print(f"Field: {progress.field_path}")
print(f" Score: {progress.score_before:.0%} → {progress.score_after:.0%}")
if progress.optimized_value:
print(f" Optimized to: {progress.optimized_value!r}")
elif progress.phase == "complete":
print(f"Optimization finished in {progress.elapsed_seconds:.1f}s")
result = prompter.optimize(
examples=examples,
on_progress=my_callback,
)
The FieldOptimizationProgress object contains:
- phase: Current phase ("baseline", "fields", "system_prompt", "instruction_prompt", "complete")
- score_before / score_after: Scores before and after this step
- field_path: Current field name (for field phases only)
- optimized_value: The actual optimized text that was generated
- elapsed_seconds: Total time elapsed since start
- improved: Whether the score went up
Troubleshooting¶
Optimization is slow¶
- Reduce examples (start with 5-10)
- Use single-pass mode (default,
sequential=False) - Use
BootstrapFewShotinstead of random search - Increase
num_threads - Use
MIPROv2(auto="light")instead of"heavy" - Limit trials with
compile_kwargs={"num_trials": 5}
High API costs¶
- Use cheaper model (
gpt-4o-mini) - Enable caching (
cache=True) - Start with fewer examples
- Use
early_stopping_patiencein sequential mode - Use simpler optimizer first
Poor optimization results¶
- Add more diverse examples
- Try
MIPROv2(auto="medium")for better quality - Use
sequential=Truefor field-by-field optimization - Use
auto_generate_prompts=Trueto add system/instruction prompts - Check that examples are correct
- Ensure examples cover edge cases
See Also¶
- Configure Models - Model configuration
- Your First Optimization - Complete workflow
- How Optimization Works - Deep dive
- DSPy Documentation - Official DSPy docs