Choosing an Evaluator¶
Guide for selecting the right evaluator for your extraction task.
Quick Decision Tree¶
Are values expected to match exactly?
- Yes → Use exact
- No → Next question
Do you expect minor variations (typos, spelling)?
- Yes → Use levenshtein
- No → Next question
Do you care about meaning/semantic similarity?
- Yes → Use text_similarity
- No → Next question
Is evaluation complex/subjective?
- Yes → Use score_judge or custom evaluator
- No → Use exact
Evaluator Comparison¶
| Evaluator | Speed | Accuracy | Cost | Best For |
|---|---|---|---|---|
exact |
Fast | Perfect for exact | Free | IDs, SKUs, exact text |
levenshtein |
Fast | Good | Free | Names, minor variations |
text_similarity |
Medium | Good | Free* | Descriptions, semantic |
score_judge |
Slow | Excellent | $$$ | Complex evaluation |
| Custom | Varies | Excellent | Free | Domain logic |
*Requires embeddings model, can be free or paid depending on setup.
Field Type Guide¶
| Field Type | Evaluator | Example |
|---|---|---|
| ID / SKU | exact |
"INV-2024-001" |
| Name / Address | levenshtein |
"John" vs "Jon" |
exact |
"john@example.com" |
|
| Description | text_similarity |
Long text fields |
| Category | exact |
"urgent", "low" |
| Rating / Score | Custom | 4.5 vs 4.0 |
Trade-offs¶
Fast evaluators (exact, levenshtein):
- ✅ No API calls
- ✅ Deterministic
- ❌ Can't understand meaning
Semantic evaluators (text_similarity):
- ✅ Understand meaning
- ✅ Flexible
- ❌ Slower
- ❌ May need embedding model
LLM evaluators (score_judge):
- ✅ Very flexible
- ✅ Can handle complex logic
- ❌ Expensive (API calls)
- ❌ Non-deterministic
Common Patterns¶
Most Fields Exact, Some Semantic¶
{
"field_overrides": {
"id": {"type": "exact"},
"name": {"type": "levenshtein"},
"description": {"type": "text_similarity"},
}
}
Text Fields with Variations¶
{
"default": {"type": "levenshtein", "config": {"threshold": 0.85}},
"field_overrides": {
"id": {"type": "exact"},
}
}
Custom Business Logic¶
{
"field_overrides": {
"amount": {
"type": "python_code",
"function": lambda ext, exp, _, __: 1.0 if abs(float(ext) - float(exp)) < 0.01 else 0.0
}
}
}
Tips¶
- Start simple: use
exactfirst - Measure baseline accuracy before optimizing
- Only add complexity (semantic, LLM) if needed
- Test evaluators on your actual data
- Consider evaluation cost when choosing
See Also¶
- Configure Evaluators — Configuration guide
- Build a Custom Evaluator — Create custom logic