Configure Optimizations¶

This guide covers how to configure optimization parameters, choose the right DSPy optimizer, and understand API call costs.

Quick Reference¶

Factor	Default	Recommended	Impact
Examples	-	10-20	Quality
Threads	4	4-8	Speed
Optimizer	Auto	Based on dataset	Quality/Cost
sequential	False	False for speed, True for quality	Single-pass vs field-by-field
parallel_fields	True	True for sequential mode	Parallelizes field optimization
max_val_examples	None	3-5 to reduce API calls	Validation set size
skip_score_threshold	None	0.95 for high-scoring fields	Skip well-optimized fields
early_stopping_patience	None	2-3 in sequential mode	Stop when no improvement
auto_generate_prompts	False	True for quick start	Auto-create system/instruction prompts
optimizer_kwargs	None	`{"auto": None, "num_candidates": 3}`	Extra kwargs for optimizer constructor
compile_kwargs	None	`{"num_trials": 5}` for testing	Extra kwargs for DSPy compile
include_fields	None	As needed	Focus optimization
exclude_fields	None	As needed	Skip metadata in scoring

Single-Pass vs Sequential Mode Optimization¶

By default, DSPydantic uses single-pass mode (sequential=False):

All field descriptions and prompts are optimized together in one DSPy compile
Reduced demo budgets (max_bootstrapped_demos=1) for speed
Fastest approach: one DSPy compile instead of N+2
Good when speed is prioritized

Use sequential=True for field-by-field optimization — slower but better quality:

Phase 1: Optimize each field description independently, deepest-nested first. Each run has a minimal search space.
Phase 2: Optimize system and instruction prompts with field descriptions fixed.
With parallel_fields=True (default), all fields optimize simultaneously, giving ~N× speedup

# Single-pass (default): fast, lower API costs
result = prompter.optimize(examples=examples)

# Sequential: field-by-field for better quality
result = prompter.optimize(examples=examples, sequential=True)

# Sequential + parallel: best of both (field-by-field quality with parallel speedup)
result = prompter.optimize(examples=examples, sequential=True, parallel_fields=True)

# Reduce API calls by capping validation examples
result = prompter.optimize(examples=examples, max_val_examples=5)

Field Inclusion and Exclusion¶

Restrict which fields are optimized and scored:

# Only optimize specific fields (reduces time and API costs)
result = prompter.optimize(
    examples=examples,
    include_fields=["address", "total"],
)

# Exclude fields from scoring (still extracted)
result = prompter.optimize(
    examples=examples,
    exclude_fields=["metadata", "timestamp"],
)

See Field Inclusion & Exclusion for details.

Number of Examples¶

Examples	Speed	Quality	API Calls
5-10	Fast	Good	~50-100
10-20	Medium	Better	~100-200
20+	Slower	Best	~200-500+

Tips:

Start with 5-10 for prototyping
Use 10-20 for production
Ensure diverse examples covering edge cases

DSPy Optimizers¶

DSPydantic uses DSPy optimizers under the hood. Understanding them helps you choose the right one.

Auto-Selection Logic¶

DSPydantic auto-selects based on dataset size:

Examples	Auto-Selected Optimizer
1-2	MIPROv2 (zero-shot mode)
3-19	BootstrapFewShot
20+	BootstrapFewShotWithRandomSearch

Optimizer Comparison¶

Optimizer	Speed	Quality	API Calls	Best For
BootstrapFewShot	Fast	Good	~N	Prototyping, small datasets
BootstrapFewShotWithRandomSearch	Medium	Better	~N×10	Production, reliable results
MIPROv2 (light)	Medium	Better	~50	Quick production
MIPROv2 (medium)	Slow	Best	~200	Balanced quality/cost
MIPROv2 (heavy)	Slowest	Best	~500+	Maximum quality
COPRO	Medium	Good	~M×K	Debugging, understanding prompts
GEPA	Medium	Good	~20-100	Complex reasoning, interpretable
SIMBA	Medium	Better	Variable	Large datasets (500+), batch
BetterTogether	Slowest	Best	Sum of all	Maximum quality, production
Ensemble	-	Best	N per input	Reliability, variance reduction
BootstrapFinetune	Slow	Best	Variable	Permanent model improvements

BootstrapFewShot¶

Purpose: Simple few-shot learning by sampling demonstrations from successful traces.

Best for: Small datasets (10-50 examples), quick prototyping.

How it works:

Runs your program on training examples
Collects traces of successful executions (based on metric)
Selects best demonstrations to include in prompts

API calls: ~N calls for N examples

from dspy.teleprompt import BootstrapFewShot

optimizer = BootstrapFewShot(max_bootstrapped_demos=4)

result = prompter.optimize(examples=examples, optimizer=optimizer)

BootstrapFewShotWithRandomSearch¶

Purpose: BootstrapFewShot with multiple random seeds to find better demonstrations.

Best for: Medium datasets (50-200 examples), reliable results.

How it works:

Runs BootstrapFewShot multiple times with different seeds
Evaluates each configuration on validation set
Returns best configuration

API calls: ~N × num_candidate_programs

from dspy.teleprompt import BootstrapFewShotWithRandomSearch

optimizer = BootstrapFewShotWithRandomSearch(
    max_bootstrapped_demos=4,
    num_candidate_programs=10,  # More = more calls, better results
)

result = prompter.optimize(examples=examples, optimizer=optimizer)

MIPROv2 (Production Recommended)¶

Purpose: Multi-step instruction and prompt optimization. The most sophisticated general optimizer.

Best for: Production optimization (50-500+ examples), maximum quality.

How it works:

Bootstrapping stage: Collects traces from running program
Grounded proposal stage: Uses LM to propose better instructions
Discrete search stage: Bayesian optimization to find best combination

API calls:

Mode	Calls	When to Use
`light`	~50	Quick optimization
`medium`	~200	Balanced
`heavy`	~500+	Maximum quality

from dspy.teleprompt import MIPROv2

# Light mode - faster, fewer calls
optimizer = MIPROv2(auto="light", num_threads=8)

# Medium mode - balanced
optimizer = MIPROv2(auto="medium", num_threads=8)

# Heavy mode - best quality
optimizer = MIPROv2(auto="heavy", num_threads=8)

result = prompter.optimize(examples=examples, optimizer=optimizer)

COPRO (Coordinate Descent)¶

Purpose: Optimizes prompts by coordinate descent - changing one aspect at a time.

Best for: Understanding which prompt components matter most, debugging.

How it works:

Starts with initial prompt
Optimizes each "coordinate" (instruction, format) independently
Combines best settings

API calls: ~M × K (M = coordinates, K = options each)

from dspy.teleprompt import COPRO

optimizer = COPRO(verbose=True)

result = prompter.optimize(examples=examples, optimizer=optimizer)

GEPA (Reflective Prompt Evolution)¶

Purpose: Generative Evolution of Prompts and Adaptations. Iteratively refines prompts through self-reflection.

Best for: Complex reasoning tasks, interpretable improvements.

How it works:

Evaluates current prompt on examples
Reflects on failures and successes
Proposes prompt modifications
Iterates until convergence

API calls: ~20-100 depending on iterations

from dspy.teleprompt import GEPA

optimizer = GEPA(num_iterations=10, verbose=True)

result = prompter.optimize(examples=examples, optimizer=optimizer)

SIMBA¶

Purpose: Scalable Instruction Meta-prompting for Batch Adaptation.

Best for: Large datasets (500+ examples), batch processing.

How it works:

Creates meta-prompts that teach the LM about the task
Scales well to large datasets
Focuses on instruction quality

API calls: Variable, scales with dataset

from dspy.teleprompt import SIMBA

optimizer = SIMBA()

result = prompter.optimize(examples=examples, optimizer=optimizer)

BetterTogether¶

Purpose: Combines multiple optimizers for best results.

Best for: Maximum quality needed, production deployments.

How it works:

Runs multiple optimizers in sequence or parallel
Uses results from one optimizer to inform the next
Returns best combined result

API calls: Sum of all component optimizer calls

from dspy.teleprompt import BetterTogether, BootstrapFewShot, MIPROv2

optimizer = BetterTogether(
    optimizers=[
        BootstrapFewShot(max_bootstrapped_demos=4),
        MIPROv2(auto="light"),
    ]
)

result = prompter.optimize(examples=examples, optimizer=optimizer)

Ensemble¶

Purpose: Combines multiple optimized programs at inference time.

Best for: Maximum reliability, reducing variance, production systems.

How it works:

Runs input through multiple programs
Aggregates outputs (voting, averaging, etc.)
Returns consensus result

API calls at inference: N calls per input (one per ensemble member)

from dspy import Ensemble

# After training multiple programs
ensemble = Ensemble(programs=[prog1, prog2, prog3], method="majority_vote")
result = ensemble(input_data)

BootstrapFinetune¶

Purpose: Finetune model weights instead of just prompts.

Best for: 100+ high-quality examples, permanent model improvements.

How it works:

Generates training data from traces
Finetunes underlying LM on that data
Uses finetuned model for inference

API calls: Depends on training data size + finetuning

from dspy.teleprompt import BootstrapFinetune

train_kwargs = {
    "use_peft": True,  # Enable LoRA
    "num_train_epochs": 1,
    "per_device_train_batch_size": 4,
    "learning_rate": 2e-4,
}

optimizer = BootstrapFinetune(train_kwargs=train_kwargs, num_threads=8)

result = prompter.optimize(examples=examples, optimizer=optimizer)

Requirements for LoRA: pip install transformers accelerate trl peft

Parallel Evaluation¶

Use multiple threads for faster optimization:

result = prompter.optimize(
    examples=examples,
    num_threads=4,  # Parallel evaluation
)

Threads	Speed	Use Case
1	Baseline	Debugging
2-4	2-3x faster	Development
4-8	3-4x faster	Production

API Call Tracking¶

After optimization, check usage:

result = prompter.optimize(examples=examples)

print(f"API calls: {result.api_calls}")
print(f"Tokens used: {result.total_tokens:,}")
print(f"Baseline: {result.baseline_score:.0%}")
print(f"Optimized: {result.optimized_score:.0%}")

Common API Call Issues¶

Issue 1: Hidden Calls in Metrics¶

Metrics that use LLMs add calls per evaluation:

# BAD: This makes an LM call per evaluation!
def my_metric(example, pred, trace=None):
    judge = dspy.ChainOfThought(JudgeSignature)
    return judge(pred.output).score  # Hidden call!

# GOOD: Use simple comparison metrics when possible
def my_metric(example, pred, trace=None):
    return pred.output == example.expected_output

Issue 2: Uncached Repeated Calls¶

Same inputs without caching = repeated API calls:

# Enable caching to avoid duplicate calls
prompter = Prompter(
    model=MyModel,
    model_id="openai/gpt-4o-mini",
    cache=True,  # Prevents duplicate API calls
)

Issue 3: Optimizer Training Calls¶

Optimizers make many calls during compilation:

# MIPROv2 medium makes ~200 calls
optimizer = MIPROv2(auto="medium")

# Start with light for testing (~50 calls)
optimizer = MIPROv2(auto="light")

Reducing API Costs¶

Start small: Use 5-10 examples initially
Use caching: cache=True prevents duplicate calls
Choose optimizer wisely: BootstrapFewShot for prototyping
Use cheaper models: gpt-4o-mini for optimization, gpt-4o for production
Start with light mode: MIPROv2(auto="light") before "heavy"

prompter = Prompter(
    model=MyModel,
    model_id="openai/gpt-4o-mini",  # Cheaper model for optimization
    cache=True,
)

# Start light
result = prompter.optimize(
    examples=examples[:10],  # Fewer examples first
    optimizer=MIPROv2(auto="light"),
)

# If results are good, try more examples
result = prompter.optimize(
    examples=examples,
    optimizer=MIPROv2(auto="medium"),
)

Early Stopping¶

In sequential mode, stop optimizing when scores plateau:

result = prompter.optimize(
    examples=examples,
    sequential=True,
    early_stopping_patience=2,  # Stop after 2 fields without improvement
)

Fields are optimized deepest-first. If early_stopping_patience consecutive fields show no improvement, the remaining fields are skipped. This can significantly reduce API costs when most fields already have good descriptions.

Auto-Generate Prompts¶

Automatically create system and instruction prompts from your model:

result = prompter.optimize(
    examples=examples,
    auto_generate_prompts=True,
)

This generates sensible defaults based on your model name and field names: - System prompt: "You are an expert at extracting structured {ModelName} data from text. Be precise and faithful to the source text." - Instruction prompt: "Extract the following fields from the given text: field1, field2, .... Return only values that are explicitly stated or clearly implied."

Existing prompts are preserved — auto-generation only fills in None values. The generated prompts are then optimized alongside field descriptions.

Contextual Optimization¶

DSPydantic automatically creates model-aware optimization signatures that give DSPy optimizers (especially MiPROv2) domain context without bloating token usage:

Dynamic class name: The optimizer sees OptimizeMedicalRecordFieldDescription instead of a generic name — zero extra tokens
Field name input: The optimizer knows which field it's improving (e.g., patient_name) — ~2-5 extra tokens per call
Concise docstring: Includes the model name for domain context — ~25 tokens

This prevents MiPROv2's proposer from generating generic meta-instructions like "Given the fields field_description, produce optimized_field_description" and instead produces actual improved field descriptions.

Tie-Breaking: Prefer Simplicity¶

When multiple optimization candidates achieve the same score, DSPydantic prefers the shorter (simpler) option. This applies to field descriptions, system prompts, and instruction prompts. Shorter descriptions save tokens at inference time across every future extraction call.

Skip Optimization Phases¶

Skip specific optimization phases to keep certain parts fixed:

# Only optimize prompts, keep field descriptions as-is
result = prompter.optimize(
    examples=examples,
    skip_field_description_optimization=True,
)

# Only optimize field descriptions, skip prompt optimization
result = prompter.optimize(
    examples=examples,
    skip_system_prompt_optimization=True,
    skip_instruction_prompt_optimization=True,
)

Custom Optimizer and Compile Arguments¶

Pass additional arguments to DSPy optimizers at construction time:

result = prompter.optimize(
    examples=examples,
    optimizer="miprov2",
    optimizer_kwargs={
        "max_bootstrapped_demos": 8,
        "auto": "medium",
        "num_threads": 8,
    },
)

Pass extra arguments to the DSPy compile() call:

# Limit MiPROv2 trials for faster iteration
result = prompter.optimize(
    examples=examples,
    compile_kwargs={"num_trials": 5, "minibatch": False},
)

This is particularly useful for controlling MiPROv2's trial count during testing or development.

Progress Tracking and Verbose Output¶

Monitor optimization progress in real-time with rich-formatted output showing optimized values:

Automatic Verbose Output¶

Enable verbose mode to see real-time progress with automatically formatted output:

result = prompter.optimize(
    examples=examples,
    verbose=True,  # Enables rich formatted output with optimized values
)

Output shows: - Header: Model name, field count, examples, optimization mode - Progress: Field-by-field scores with improved/unchanged indicators - Optimized Values: The actual optimized descriptions after each field - Summary Table: Final scores, improvements, API calls, tokens

Custom Progress Callbacks¶

For custom progress handling, use the on_progress callback:

from dspydantic import FieldOptimizationProgress

def my_callback(progress: FieldOptimizationProgress):
    if progress.phase == "fields":
        print(f"Field: {progress.field_path}")
        print(f"  Score: {progress.score_before:.0%} → {progress.score_after:.0%}")
        if progress.optimized_value:
            print(f"  Optimized to: {progress.optimized_value!r}")
    elif progress.phase == "complete":
        print(f"Optimization finished in {progress.elapsed_seconds:.1f}s")

result = prompter.optimize(
    examples=examples,
    on_progress=my_callback,
)

The FieldOptimizationProgress object contains: - phase: Current phase ("baseline", "fields", "system_prompt", "instruction_prompt", "complete") - score_before / score_after: Scores before and after this step - field_path: Current field name (for field phases only) - optimized_value: The actual optimized text that was generated - elapsed_seconds: Total time elapsed since start - improved: Whether the score went up

Troubleshooting¶

Optimization is slow¶

Reduce examples (start with 5-10)
Use single-pass mode (default, sequential=False)
Use BootstrapFewShot instead of random search
Increase num_threads
Use MIPROv2(auto="light") instead of "heavy"
Limit trials with compile_kwargs={"num_trials": 5}

High API costs¶

Use cheaper model (gpt-4o-mini)
Enable caching (cache=True)
Start with fewer examples
Use early_stopping_patience in sequential mode
Use simpler optimizer first

Poor optimization results¶

Add more diverse examples
Try MIPROv2(auto="medium") for better quality
Use sequential=True for field-by-field optimization
Use auto_generate_prompts=True to add system/instruction prompts
Check that examples are correct
Ensure examples cover edge cases

Configure Optimizations¶

Quick Reference¶

Single-Pass vs Sequential Mode Optimization¶

Field Inclusion and Exclusion¶

Number of Examples¶

DSPy Optimizers¶

Auto-Selection Logic¶

Optimizer Comparison¶

BootstrapFewShot¶

BootstrapFewShotWithRandomSearch¶

MIPROv2 (Production Recommended)¶

COPRO (Coordinate Descent)¶

GEPA (Reflective Prompt Evolution)¶

SIMBA¶

BetterTogether¶

Ensemble¶

BootstrapFinetune¶

Parallel Evaluation¶

API Call Tracking¶

Common API Call Issues¶

Issue 1: Hidden Calls in Metrics¶

Issue 2: Uncached Repeated Calls¶

Issue 3: Optimizer Training Calls¶

Reducing API Costs¶

Early Stopping¶

Auto-Generate Prompts¶

Contextual Optimization¶

Tie-Breaking: Prefer Simplicity¶

Skip Optimization Phases¶

Custom Optimizer and Compile Arguments¶

Progress Tracking and Verbose Output¶

Automatic Verbose Output¶

Custom Progress Callbacks¶

Troubleshooting¶

Optimization is slow¶

High API costs¶

Poor optimization results¶

See Also¶