Configure Evaluators¶
This guide shows you how to configure evaluators for different fields and use cases. Evaluators guide the optimization process by measuring how well extracted data matches expected data.
Problem¶
You need different evaluation strategies for different fields, or want to use pre-computed scores instead of running evaluations during optimization.
Solution¶
Use evaluator_config to configure evaluators per field or use PredefinedScoreEvaluator for pre-computed scores.
Available Evaluator Types¶
| Evaluator | Alias | Use Case | Data Types | Speed |
|---|---|---|---|---|
exact |
exact |
Precise values that must match exactly | Strings | Fast |
levenshtein |
levenshtein |
Text with minor spelling/formatting differences | Strings | Fast |
text_similarity |
text_similarity |
Text where meaning matters more than exact wording | Strings | Medium |
score_judge |
score_judge |
Numeric scores needing quality assessment | Numbers | Slow |
label_model_grader |
label_model_grader |
Classification labels needing context-aware evaluation | Labels/Categories | Slow |
python_code |
python_code |
Custom evaluation logic for complex business rules | Any | Medium |
predefined_score |
predefined_score |
Pre-computed scores (no evaluation needed) | Any | Fastest |
Common Configuration Patterns¶
| Pattern | Configuration | Use Case |
|---|---|---|
| Most Fields Exact | default: "exact" |
Most fields need exact matching |
| Text with Variations | default: "levenshtein" |
Text fields may have typos |
| Semantic Matching | default: "text_similarity" |
Meaning matters more than wording |
| Mixed Strategy | default + field_overrides |
Different fields need different evaluators |
Using Pre-defined Scores¶
If you already have scores, use PredefinedScoreEvaluator:
from dspydantic import Prompter
from dspydantic.evaluators import PredefinedScoreEvaluator
# Pre-computed scores
scores = [0.95, 0.87, 0.92, 1.0, 0.78]
evaluator = PredefinedScoreEvaluator(config={"scores": scores})
# Configure DSPy first
import dspy
lm = dspy.LM("openai/gpt-4o", api_key="your-api-key")
dspy.configure(lm=lm)
prompter = Prompter(model=MyModel)
result = prompter.optimize(
examples=examples,
evaluate_fn=evaluator,
)
Per-Field Evaluator Configuration¶
Configure different evaluators for different fields:
# Configure DSPy first
import dspy
lm = dspy.LM("openai/gpt-4o", api_key="your-api-key")
dspy.configure(lm=lm)
prompter = Prompter(model=User)
result = prompter.optimize(
examples=examples,
evaluator_config={
"default": {
"type": "exact",
"config": {"case_sensitive": False},
},
"field_overrides": {
"name": {
"type": "exact",
"config": {"case_sensitive": True}, # Names must match exactly
},
"description": {
"type": "text_similarity",
"config": {
"model": "sentence-transformers/all-MiniLM-L6-v2",
"threshold": 0.7,
},
},
"rating": {
"type": "score_judge",
"config": {
"criteria": "Rate the quality of this rating on a scale of 0-1",
"temperature": 0.0,
},
},
},
},
)
Configuration Examples Table¶
| Field Type | Evaluator | Configuration | Reason |
|---|---|---|---|
| ID, SKU | exact |
case_sensitive: True |
Must match exactly |
| Name | exact |
case_sensitive: False |
Case variations OK |
| Description | text_similarity |
threshold: 0.7 |
Meaning matters |
| Rating | score_judge |
Custom criteria | Context-aware |
| Age | python_code |
Custom function | Business rules |
Custom Evaluator Class¶
Create a custom evaluator:
# Configure DSPy first
import dspy
lm = dspy.LM("openai/gpt-4o", api_key="your-api-key")
dspy.configure(lm=lm)
class ThresholdEvaluator:
"""Custom evaluator that checks if values are within a threshold."""
def __init__(self, config: dict) -> None:
self.threshold = config.get("threshold", 0.1)
def evaluate(
self,
extracted: float,
expected: float,
input_data: dict | None = None,
field_path: str | None = None,
) -> float:
"""Check if extracted value is within threshold of expected."""
diff = abs(extracted - expected)
return 1.0 if diff <= self.threshold else max(0.0, 1.0 - (diff / expected))
prompter = Prompter(model=RatingModel)
result = prompter.optimize(
examples=examples,
evaluator_config={
"default": {
"class": ThresholdEvaluator,
"config": {"threshold": 0.05},
},
},
)
Python Code Evaluator¶
Use a callable for custom evaluation logic:
def age_evaluator(extracted, expected, input_data=None, field_path=None):
"""Custom evaluation function for age field."""
if field_path == "age":
diff = abs(extracted - expected)
if diff == 0:
return 1.0
elif diff <= 2:
return 0.8
else:
return max(0.0, 1.0 - (diff / 10))
# For other fields, use exact match
return 1.0 if extracted == expected else 0.0
# Configure DSPy first
import dspy
lm = dspy.LM("openai/gpt-4o", api_key="your-api-key")
dspy.configure(lm=lm)
prompter = Prompter(model=SimpleUser)
result = prompter.optimize(
examples=examples,
evaluator_config={
"default": "exact",
"field_overrides": {
"age": {
"type": "python_code",
"config": {
"function": age_evaluator,
},
},
},
},
)
Tips¶
- Use
defaultfor most fields, override specific fields as needed - Pre-defined scores are fastest when you have ground truth
- Text similarity works well for semantic matching
- See When to Use Which for evaluator selection guidance
- See Reference: Evaluators for all options
See Also¶
- When to Use Which - Choose the right evaluator
- Custom Evaluators - Create custom evaluation logic
- Understanding Evaluators - Deep dive into evaluators
- Reference: Evaluators - Complete API documentation