Skip to content

Configure Optimizations

This guide covers how to configure optimization parameters, choose the right DSPy optimizer, and understand API call costs.


Quick Reference

Factor Default Recommended Impact
Examples - 10-20 Quality
Threads 4 4-8 Speed
Optimizer Auto Based on dataset Quality/Cost
sequential False False for speed, True for quality Single-pass vs field-by-field
parallel_fields True True for sequential mode Parallelizes field optimization
max_val_examples None 3-5 to reduce API calls Validation set size
skip_score_threshold None 0.95 for high-scoring fields Skip well-optimized fields
early_stopping_patience None 2-3 in sequential mode Stop when no improvement
auto_generate_prompts False True for quick start Auto-create system/instruction prompts
optimizer_kwargs None {"auto": None, "num_candidates": 3} Extra kwargs for optimizer constructor
compile_kwargs None {"num_trials": 5} for testing Extra kwargs for DSPy compile
include_fields None As needed Focus optimization
exclude_fields None As needed Skip metadata in scoring

Single-Pass vs Sequential Mode Optimization

By default, DSPydantic uses single-pass mode (sequential=False):

  1. All field descriptions and prompts are optimized together in one DSPy compile
  2. Reduced demo budgets (max_bootstrapped_demos=1) for speed
  3. Fastest approach: one DSPy compile instead of N+2
  4. Good when speed is prioritized

Use sequential=True for field-by-field optimization — slower but better quality:

  1. Phase 1: Optimize each field description independently, deepest-nested first. Each run has a minimal search space.
  2. Phase 2: Optimize system and instruction prompts with field descriptions fixed.
  3. With parallel_fields=True (default), all fields optimize simultaneously, giving ~N× speedup
# Single-pass (default): fast, lower API costs
result = prompter.optimize(examples=examples)

# Sequential: field-by-field for better quality
result = prompter.optimize(examples=examples, sequential=True)

# Sequential + parallel: best of both (field-by-field quality with parallel speedup)
result = prompter.optimize(examples=examples, sequential=True, parallel_fields=True)

# Reduce API calls by capping validation examples
result = prompter.optimize(examples=examples, max_val_examples=5)

Field Inclusion and Exclusion

Restrict which fields are optimized and scored:

# Only optimize specific fields (reduces time and API costs)
result = prompter.optimize(
    examples=examples,
    include_fields=["address", "total"],
)

# Exclude fields from scoring (still extracted)
result = prompter.optimize(
    examples=examples,
    exclude_fields=["metadata", "timestamp"],
)

See Field Inclusion & Exclusion for details.


Number of Examples

Examples Speed Quality API Calls
5-10 Fast Good ~50-100
10-20 Medium Better ~100-200
20+ Slower Best ~200-500+

Tips:

  • Start with 5-10 for prototyping
  • Use 10-20 for production
  • Ensure diverse examples covering edge cases

DSPy Optimizers

DSPydantic uses DSPy optimizers under the hood. Understanding them helps you choose the right one.

Auto-Selection Logic

DSPydantic auto-selects based on dataset size:

Examples Auto-Selected Optimizer
1-2 MIPROv2 (zero-shot mode)
3-19 BootstrapFewShot
20+ BootstrapFewShotWithRandomSearch

Optimizer Comparison

Optimizer Speed Quality API Calls Best For
BootstrapFewShot Fast Good ~N Prototyping, small datasets
BootstrapFewShotWithRandomSearch Medium Better ~N×10 Production, reliable results
MIPROv2 (light) Medium Better ~50 Quick production
MIPROv2 (medium) Slow Best ~200 Balanced quality/cost
MIPROv2 (heavy) Slowest Best ~500+ Maximum quality
COPRO Medium Good ~M×K Debugging, understanding prompts
GEPA Medium Good ~20-100 Complex reasoning, interpretable
SIMBA Medium Better Variable Large datasets (500+), batch
BetterTogether Slowest Best Sum of all Maximum quality, production
Ensemble - Best N per input Reliability, variance reduction
BootstrapFinetune Slow Best Variable Permanent model improvements

BootstrapFewShot

Purpose: Simple few-shot learning by sampling demonstrations from successful traces.

Best for: Small datasets (10-50 examples), quick prototyping.

How it works:

  1. Runs your program on training examples
  2. Collects traces of successful executions (based on metric)
  3. Selects best demonstrations to include in prompts

API calls: ~N calls for N examples

from dspy.teleprompt import BootstrapFewShot

optimizer = BootstrapFewShot(max_bootstrapped_demos=4)

result = prompter.optimize(examples=examples, optimizer=optimizer)

BootstrapFewShotWithRandomSearch

Purpose: BootstrapFewShot with multiple random seeds to find better demonstrations.

Best for: Medium datasets (50-200 examples), reliable results.

How it works:

  1. Runs BootstrapFewShot multiple times with different seeds
  2. Evaluates each configuration on validation set
  3. Returns best configuration

API calls: ~N × num_candidate_programs

from dspy.teleprompt import BootstrapFewShotWithRandomSearch

optimizer = BootstrapFewShotWithRandomSearch(
    max_bootstrapped_demos=4,
    num_candidate_programs=10,  # More = more calls, better results
)

result = prompter.optimize(examples=examples, optimizer=optimizer)

Purpose: Multi-step instruction and prompt optimization. The most sophisticated general optimizer.

Best for: Production optimization (50-500+ examples), maximum quality.

How it works:

  1. Bootstrapping stage: Collects traces from running program
  2. Grounded proposal stage: Uses LM to propose better instructions
  3. Discrete search stage: Bayesian optimization to find best combination

API calls:

Mode Calls When to Use
light ~50 Quick optimization
medium ~200 Balanced
heavy ~500+ Maximum quality
from dspy.teleprompt import MIPROv2

# Light mode - faster, fewer calls
optimizer = MIPROv2(auto="light", num_threads=8)

# Medium mode - balanced
optimizer = MIPROv2(auto="medium", num_threads=8)

# Heavy mode - best quality
optimizer = MIPROv2(auto="heavy", num_threads=8)

result = prompter.optimize(examples=examples, optimizer=optimizer)

COPRO (Coordinate Descent)

Purpose: Optimizes prompts by coordinate descent - changing one aspect at a time.

Best for: Understanding which prompt components matter most, debugging.

How it works:

  1. Starts with initial prompt
  2. Optimizes each "coordinate" (instruction, format) independently
  3. Combines best settings

API calls: ~M × K (M = coordinates, K = options each)

from dspy.teleprompt import COPRO

optimizer = COPRO(verbose=True)

result = prompter.optimize(examples=examples, optimizer=optimizer)

GEPA (Reflective Prompt Evolution)

Purpose: Generative Evolution of Prompts and Adaptations. Iteratively refines prompts through self-reflection.

Best for: Complex reasoning tasks, interpretable improvements.

How it works:

  1. Evaluates current prompt on examples
  2. Reflects on failures and successes
  3. Proposes prompt modifications
  4. Iterates until convergence

API calls: ~20-100 depending on iterations

from dspy.teleprompt import GEPA

optimizer = GEPA(num_iterations=10, verbose=True)

result = prompter.optimize(examples=examples, optimizer=optimizer)

SIMBA

Purpose: Scalable Instruction Meta-prompting for Batch Adaptation.

Best for: Large datasets (500+ examples), batch processing.

How it works:

  1. Creates meta-prompts that teach the LM about the task
  2. Scales well to large datasets
  3. Focuses on instruction quality

API calls: Variable, scales with dataset

from dspy.teleprompt import SIMBA

optimizer = SIMBA()

result = prompter.optimize(examples=examples, optimizer=optimizer)

BetterTogether

Purpose: Combines multiple optimizers for best results.

Best for: Maximum quality needed, production deployments.

How it works:

  1. Runs multiple optimizers in sequence or parallel
  2. Uses results from one optimizer to inform the next
  3. Returns best combined result

API calls: Sum of all component optimizer calls

from dspy.teleprompt import BetterTogether, BootstrapFewShot, MIPROv2

optimizer = BetterTogether(
    optimizers=[
        BootstrapFewShot(max_bootstrapped_demos=4),
        MIPROv2(auto="light"),
    ]
)

result = prompter.optimize(examples=examples, optimizer=optimizer)

Ensemble

Purpose: Combines multiple optimized programs at inference time.

Best for: Maximum reliability, reducing variance, production systems.

How it works:

  1. Runs input through multiple programs
  2. Aggregates outputs (voting, averaging, etc.)
  3. Returns consensus result

API calls at inference: N calls per input (one per ensemble member)

from dspy import Ensemble

# After training multiple programs
ensemble = Ensemble(programs=[prog1, prog2, prog3], method="majority_vote")
result = ensemble(input_data)

BootstrapFinetune

Purpose: Finetune model weights instead of just prompts.

Best for: 100+ high-quality examples, permanent model improvements.

How it works:

  1. Generates training data from traces
  2. Finetunes underlying LM on that data
  3. Uses finetuned model for inference

API calls: Depends on training data size + finetuning

from dspy.teleprompt import BootstrapFinetune

train_kwargs = {
    "use_peft": True,  # Enable LoRA
    "num_train_epochs": 1,
    "per_device_train_batch_size": 4,
    "learning_rate": 2e-4,
}

optimizer = BootstrapFinetune(train_kwargs=train_kwargs, num_threads=8)

result = prompter.optimize(examples=examples, optimizer=optimizer)

Requirements for LoRA: pip install transformers accelerate trl peft


Parallel Evaluation

Use multiple threads for faster optimization:

result = prompter.optimize(
    examples=examples,
    num_threads=4,  # Parallel evaluation
)
Threads Speed Use Case
1 Baseline Debugging
2-4 2-3x faster Development
4-8 3-4x faster Production

API Call Tracking

After optimization, check usage:

result = prompter.optimize(examples=examples)

print(f"API calls: {result.api_calls}")
print(f"Tokens used: {result.total_tokens:,}")
print(f"Baseline: {result.baseline_score:.0%}")
print(f"Optimized: {result.optimized_score:.0%}")

Common API Call Issues

Issue 1: Hidden Calls in Metrics

Metrics that use LLMs add calls per evaluation:

# BAD: This makes an LM call per evaluation!
def my_metric(example, pred, trace=None):
    judge = dspy.ChainOfThought(JudgeSignature)
    return judge(pred.output).score  # Hidden call!

# GOOD: Use simple comparison metrics when possible
def my_metric(example, pred, trace=None):
    return pred.output == example.expected_output

Issue 2: Uncached Repeated Calls

Same inputs without caching = repeated API calls:

# Enable caching to avoid duplicate calls
prompter = Prompter(
    model=MyModel,
    model_id="openai/gpt-4o-mini",
    cache=True,  # Prevents duplicate API calls
)

Issue 3: Optimizer Training Calls

Optimizers make many calls during compilation:

# MIPROv2 medium makes ~200 calls
optimizer = MIPROv2(auto="medium")

# Start with light for testing (~50 calls)
optimizer = MIPROv2(auto="light")

Reducing API Costs

  1. Start small: Use 5-10 examples initially
  2. Use caching: cache=True prevents duplicate calls
  3. Choose optimizer wisely: BootstrapFewShot for prototyping
  4. Use cheaper models: gpt-4o-mini for optimization, gpt-4o for production
  5. Start with light mode: MIPROv2(auto="light") before "heavy"
prompter = Prompter(
    model=MyModel,
    model_id="openai/gpt-4o-mini",  # Cheaper model for optimization
    cache=True,
)

# Start light
result = prompter.optimize(
    examples=examples[:10],  # Fewer examples first
    optimizer=MIPROv2(auto="light"),
)

# If results are good, try more examples
result = prompter.optimize(
    examples=examples,
    optimizer=MIPROv2(auto="medium"),
)

Early Stopping

In sequential mode, stop optimizing when scores plateau:

result = prompter.optimize(
    examples=examples,
    sequential=True,
    early_stopping_patience=2,  # Stop after 2 fields without improvement
)

Fields are optimized deepest-first. If early_stopping_patience consecutive fields show no improvement, the remaining fields are skipped. This can significantly reduce API costs when most fields already have good descriptions.


Auto-Generate Prompts

Automatically create system and instruction prompts from your model:

result = prompter.optimize(
    examples=examples,
    auto_generate_prompts=True,
)

This generates sensible defaults based on your model name and field names: - System prompt: "You are an expert at extracting structured {ModelName} data from text. Be precise and faithful to the source text." - Instruction prompt: "Extract the following fields from the given text: field1, field2, .... Return only values that are explicitly stated or clearly implied."

Existing prompts are preserved — auto-generation only fills in None values. The generated prompts are then optimized alongside field descriptions.


Contextual Optimization

DSPydantic automatically creates model-aware optimization signatures that give DSPy optimizers (especially MiPROv2) domain context without bloating token usage:

  • Dynamic class name: The optimizer sees OptimizeMedicalRecordFieldDescription instead of a generic name — zero extra tokens
  • Field name input: The optimizer knows which field it's improving (e.g., patient_name) — ~2-5 extra tokens per call
  • Concise docstring: Includes the model name for domain context — ~25 tokens

This prevents MiPROv2's proposer from generating generic meta-instructions like "Given the fields field_description, produce optimized_field_description" and instead produces actual improved field descriptions.

Tie-Breaking: Prefer Simplicity

When multiple optimization candidates achieve the same score, DSPydantic prefers the shorter (simpler) option. This applies to field descriptions, system prompts, and instruction prompts. Shorter descriptions save tokens at inference time across every future extraction call.


Skip Optimization Phases

Skip specific optimization phases to keep certain parts fixed:

# Only optimize prompts, keep field descriptions as-is
result = prompter.optimize(
    examples=examples,
    skip_field_description_optimization=True,
)

# Only optimize field descriptions, skip prompt optimization
result = prompter.optimize(
    examples=examples,
    skip_system_prompt_optimization=True,
    skip_instruction_prompt_optimization=True,
)

Custom Optimizer and Compile Arguments

Pass additional arguments to DSPy optimizers at construction time:

result = prompter.optimize(
    examples=examples,
    optimizer="miprov2",
    optimizer_kwargs={
        "max_bootstrapped_demos": 8,
        "auto": "medium",
        "num_threads": 8,
    },
)

Pass extra arguments to the DSPy compile() call:

# Limit MiPROv2 trials for faster iteration
result = prompter.optimize(
    examples=examples,
    compile_kwargs={"num_trials": 5, "minibatch": False},
)

This is particularly useful for controlling MiPROv2's trial count during testing or development.


Progress Tracking and Verbose Output

Monitor optimization progress in real-time with rich-formatted output showing optimized values:

Automatic Verbose Output

Enable verbose mode to see real-time progress with automatically formatted output:

result = prompter.optimize(
    examples=examples,
    verbose=True,  # Enables rich formatted output with optimized values
)

Output shows: - Header: Model name, field count, examples, optimization mode - Progress: Field-by-field scores with improved/unchanged indicators - Optimized Values: The actual optimized descriptions after each field - Summary Table: Final scores, improvements, API calls, tokens

Custom Progress Callbacks

For custom progress handling, use the on_progress callback:

from dspydantic import FieldOptimizationProgress

def my_callback(progress: FieldOptimizationProgress):
    if progress.phase == "fields":
        print(f"Field: {progress.field_path}")
        print(f"  Score: {progress.score_before:.0%}{progress.score_after:.0%}")
        if progress.optimized_value:
            print(f"  Optimized to: {progress.optimized_value!r}")
    elif progress.phase == "complete":
        print(f"Optimization finished in {progress.elapsed_seconds:.1f}s")

result = prompter.optimize(
    examples=examples,
    on_progress=my_callback,
)

The FieldOptimizationProgress object contains: - phase: Current phase ("baseline", "fields", "system_prompt", "instruction_prompt", "complete") - score_before / score_after: Scores before and after this step - field_path: Current field name (for field phases only) - optimized_value: The actual optimized text that was generated - elapsed_seconds: Total time elapsed since start - improved: Whether the score went up


Troubleshooting

Optimization is slow

  • Reduce examples (start with 5-10)
  • Use single-pass mode (default, sequential=False)
  • Use BootstrapFewShot instead of random search
  • Increase num_threads
  • Use MIPROv2(auto="light") instead of "heavy"
  • Limit trials with compile_kwargs={"num_trials": 5}

High API costs

  • Use cheaper model (gpt-4o-mini)
  • Enable caching (cache=True)
  • Start with fewer examples
  • Use early_stopping_patience in sequential mode
  • Use simpler optimizer first

Poor optimization results

  • Add more diverse examples
  • Try MIPROv2(auto="medium") for better quality
  • Use sequential=True for field-by-field optimization
  • Use auto_generate_prompts=True to add system/instruction prompts
  • Check that examples are correct
  • Ensure examples cover edge cases

See Also