Skip to content

Evaluators

Evaluation system for measuring extraction quality.

Evaluator Overview

Evaluator Alias When to Use Data Types Speed Accuracy
StringCheckEvaluator exact Precise values that must match exactly (IDs, codes, exact strings) Strings Fast Exact
LevenshteinEvaluator levenshtein Text with minor spelling or formatting differences Strings Fast Fuzzy
TextSimilarityEvaluator text_similarity Text where meaning matters more than exact wording Strings Medium Semantic
ScoreJudge score_judge Numeric scores or ratings needing quality assessment Numbers Slow LLM-based
LabelModelGrader label_model_grader Classification labels needing context-aware evaluation Labels/Categories Slow LLM-based
PythonCodeEvaluator python_code Custom evaluation logic for complex business rules Any Medium Custom
PredefinedScoreEvaluator predefined_score Pre-computed scores (no evaluation needed) Any Fastest Pre-computed

Quick Selection Guide

  • Exact match needed? → Use exact (StringCheckEvaluator)
  • Minor variations OK? → Use levenshtein (LevenshteinEvaluator)
  • Semantic similarity? → Use text_similarity (TextSimilarityEvaluator)
  • Complex evaluation? → Use score_judge or label_model_grader
  • Custom logic? → Use python_code (PythonCodeEvaluator)
  • Already have scores? → Use predefined_score (PredefinedScoreEvaluator)

API Reference

BaseEvaluator

Bases: Protocol

Protocol for all evaluators.

All evaluators must implement the evaluate method that takes extracted and expected values and returns a score between 0.0 and 1.0.

Functions

evaluate

evaluate(extracted, expected, input_data=None, field_path=None)

Evaluate extracted value against expected value.

Parameters:

Name Type Description Default
extracted Any

The extracted value to evaluate.

required
expected Any

The expected value to compare against.

required
input_data dict[str, Any] | None

Optional input data dictionary for context.

None
field_path str | None

Optional field path (e.g., "name", "address.street") for context.

None

Returns:

Type Description
float

Score between 0.0 and 1.0, where 1.0 is a perfect match.

Source code in src/dspydantic/evaluators/config.py
def evaluate(
    self,
    extracted: Any,
    expected: Any,
    input_data: dict[str, Any] | None = None,
    field_path: str | None = None,
) -> float:
    """Evaluate extracted value against expected value.

    Args:
        extracted: The extracted value to evaluate.
        expected: The expected value to compare against.
        input_data: Optional input data dictionary for context.
        field_path: Optional field path (e.g., "name", "address.street") for context.

    Returns:
        Score between 0.0 and 1.0, where 1.0 is a perfect match.
    """
    ...

StringCheckEvaluator

StringCheckEvaluator(config)

Evaluator that performs exact string matching.

Best for IDs, codes, enums, and other values that must match exactly.

Parameters:

Name Type Description Default
config dict[str, Any]

Configuration dictionary with options: - case_sensitive (bool): Whether comparison is case-sensitive (default: True) - strip_whitespace (bool): Whether to strip whitespace (default: True)

required
Example

evaluator = StringCheckEvaluator(config={}) evaluator.evaluate("ABC123", "ABC123") 1.0 evaluator.evaluate("abc123", "ABC123") # Case mismatch 0.0 evaluator.evaluate(" ABC123 ", "ABC123") # Whitespace stripped 1.0

Case-insensitive matching:

evaluator = StringCheckEvaluator(config={"case_sensitive": False}) evaluator.evaluate("abc123", "ABC123") 1.0

Initialize StringCheckEvaluator.

Parameters:

Name Type Description Default
config dict[str, Any]

Configuration dictionary with case_sensitive and strip_whitespace options.

required
Source code in src/dspydantic/evaluators/string_check.py
def __init__(self, config: dict[str, Any]) -> None:
    """Initialize StringCheckEvaluator.

    Args:
        config: Configuration dictionary with case_sensitive and strip_whitespace options.
    """
    self.config = config
    self.case_sensitive = config.get("case_sensitive", True)
    self.strip_whitespace = config.get("strip_whitespace", True)

Functions

evaluate

evaluate(extracted, expected, input_data=None, field_path=None)

Evaluate using exact string matching.

Parameters:

Name Type Description Default
extracted Any

Extracted value.

required
expected Any

Expected value.

required
input_data dict[str, Any] | None

Optional input data (not used).

None
field_path str | None

Optional field path (not used).

None

Returns:

Type Description
float

Score 1.0 if match, 0.0 otherwise.

Source code in src/dspydantic/evaluators/string_check.py
def evaluate(
    self,
    extracted: Any,
    expected: Any,
    input_data: dict[str, Any] | None = None,
    field_path: str | None = None,
) -> float:
    """Evaluate using exact string matching.

    Args:
        extracted: Extracted value.
        expected: Expected value.
        input_data: Optional input data (not used).
        field_path: Optional field path (not used).

    Returns:
        Score 1.0 if match, 0.0 otherwise.
    """
    extracted_str = str(extracted)
    expected_str = str(expected)

    if self.strip_whitespace:
        extracted_str = extracted_str.strip()
        expected_str = expected_str.strip()

    if not self.case_sensitive:
        extracted_str = extracted_str.lower()
        expected_str = expected_str.lower()

    return 1.0 if extracted_str == expected_str else 0.0

LevenshteinEvaluator

LevenshteinEvaluator(config)

Evaluator that uses Levenshtein distance for fuzzy string matching.

Useful when extracted values may have minor typos or formatting differences compared to expected values.

Parameters:

Name Type Description Default
config dict[str, Any]

Configuration dictionary with options: - threshold (float): Minimum similarity threshold (0-1, default: 0.0). Values below threshold return 0.0.

required
Example

evaluator = LevenshteinEvaluator(config={}) evaluator.evaluate("John Doe", "John Doe") 1.0 evaluator.evaluate("Jon Doe", "John Doe") # Minor typo 0.875 evaluator.evaluate("Jane Smith", "John Doe") # Very different 0.25

With threshold:

evaluator = LevenshteinEvaluator(config={"threshold": 0.8}) evaluator.evaluate("Jon Doe", "John Doe") # Above threshold 0.875 evaluator.evaluate("Jane", "John") # Below threshold, returns 0 0.0

Initialize LevenshteinEvaluator.

Parameters:

Name Type Description Default
config dict[str, Any]

Configuration dictionary with threshold option.

required
Source code in src/dspydantic/evaluators/levenshtein.py
def __init__(self, config: dict[str, Any]) -> None:
    """Initialize LevenshteinEvaluator.

    Args:
        config: Configuration dictionary with threshold option.
    """
    self.config = config
    self.threshold = config.get("threshold", 0.0)

Functions

evaluate

evaluate(extracted, expected, input_data=None, field_path=None)

Evaluate using Levenshtein distance.

Parameters:

Name Type Description Default
extracted Any

Extracted value.

required
expected Any

Expected value.

required
input_data dict[str, Any] | None

Optional input data (not used).

None
field_path str | None

Optional field path (not used).

None

Returns:

Type Description
float

Similarity score between 0.0 and 1.0.

Source code in src/dspydantic/evaluators/levenshtein.py
def evaluate(
    self,
    extracted: Any,
    expected: Any,
    input_data: dict[str, Any] | None = None,
    field_path: str | None = None,
) -> float:
    """Evaluate using Levenshtein distance.

    Args:
        extracted: Extracted value.
        expected: Expected value.
        input_data: Optional input data (not used).
        field_path: Optional field path (not used).

    Returns:
        Similarity score between 0.0 and 1.0.
    """
    extracted_str = str(extracted).strip()
    expected_str = str(expected).strip()

    if extracted_str == expected_str:
        return 1.0

    max_len = max(len(extracted_str), len(expected_str))
    if max_len == 0:
        return 1.0

    distance = self._levenshtein_distance(extracted_str, expected_str)
    similarity = 1.0 - (distance / max_len)

    return max(0.0, similarity) if similarity >= self.threshold else 0.0

TextSimilarityEvaluator

TextSimilarityEvaluator(config)

Evaluator that uses embeddings for semantic similarity.

Best for text where meaning matters more than exact wording. Uses embedding models to compute cosine similarity between extracted and expected values.

Parameters:

Name Type Description Default
config dict[str, Any]

Configuration dictionary with options: - model (str): Embedding model (default: "sentence-transformers/all-MiniLM-L6-v2") - provider (str): "sentence-transformers" or "openai" (default: "sentence-transformers") - api_key (str): API key for OpenAI provider - threshold (float): Minimum similarity (0-1, default: 0.0)

required

Raises:

Type Description
ImportError

If sentence-transformers is not installed when using that provider.

Example

Requires: pip install sentence-transformers

evaluator = TextSimilarityEvaluator(config={}) # doctest: +SKIP evaluator.evaluate("CEO", "Chief Executive Officer") # doctest: +SKIP 0.82 # Semantically similar

With OpenAI embeddings:

evaluator = TextSimilarityEvaluator(config={ # doctest: +SKIP ... "provider": "openai", ... "model": "text-embedding-ada-002", ... "api_key": "your-key" ... })

Initialize TextSimilarityEvaluator.

Parameters:

Name Type Description Default
config dict[str, Any]

Configuration dictionary with model, provider, api_key, threshold options.

required
Source code in src/dspydantic/evaluators/text_similarity.py
def __init__(self, config: dict[str, Any]) -> None:
    """Initialize TextSimilarityEvaluator.

    Args:
        config: Configuration dictionary with model, provider, api_key, threshold options.
    """
    self.config = config
    self.model_name = config.get("model", "sentence-transformers/all-MiniLM-L6-v2")
    self.provider = config.get("provider", "sentence-transformers")
    self.api_key = config.get("api_key")
    self.threshold = config.get("threshold", 0.0)
    self._embedder = None

Functions

evaluate

evaluate(extracted, expected, input_data=None, field_path=None)

Evaluate using semantic similarity via embeddings.

Parameters:

Name Type Description Default
extracted Any

Extracted value.

required
expected Any

Expected value.

required
input_data dict[str, Any] | None

Optional input data (not used).

None
field_path str | None

Optional field path (not used).

None

Returns:

Type Description
float

Similarity score between 0.0 and 1.0.

Source code in src/dspydantic/evaluators/text_similarity.py
def evaluate(
    self,
    extracted: Any,
    expected: Any,
    input_data: dict[str, Any] | None = None,
    field_path: str | None = None,
) -> float:
    """Evaluate using semantic similarity via embeddings.

    Args:
        extracted: Extracted value.
        expected: Expected value.
        input_data: Optional input data (not used).
        field_path: Optional field path (not used).

    Returns:
        Similarity score between 0.0 and 1.0.
    """
    extracted_str = str(extracted)
    expected_str = str(expected)

    if extracted_str == expected_str:
        return 1.0

    try:
        embeddings = self._get_embeddings([extracted_str, expected_str])
        similarity = self._cosine_similarity(embeddings[0], embeddings[1])
        return max(0.0, similarity) if similarity >= self.threshold else 0.0
    except Exception:
        # Fallback to exact match if embeddings fail
        return 1.0 if extracted_str == expected_str else 0.0

ScoreJudge

ScoreJudge(config)

Evaluator that uses an LLM to assign a numeric score.

Uses a language model to evaluate extraction quality when expected values are not available or when semantic judgment is needed.

Parameters:

Name Type Description Default
config dict[str, Any]

Configuration dictionary with options: - criteria (str): Scoring criteria/prompt (default: "Rate the quality on a scale of 0-1") - lm (dspy.LM | None): Custom LM instance (default: uses dspy.settings.lm) - temperature (float): LLM temperature (default: 0.0) - system_prompt (str | None): Custom system prompt for the judge

required

Raises:

Type Description
ValueError

If no LM is available.

Example

import dspy # doctest: +SKIP dspy.configure(lm=dspy.LM("openai/gpt-4o-mini")) # doctest: +SKIP evaluator = ScoreJudge(config={ # doctest: +SKIP ... "criteria": "Rate how well the extracted summary captures the key points" ... }) evaluator.evaluate( # doctest: +SKIP ... extracted="Company reported strong Q3 earnings", ... expected=None, # No expected value - judge evaluates quality ... input_data={"text": "Acme Corp announced record Q3 profits..."} ... ) 0.85

Initialize ScoreJudge.

Parameters:

Name Type Description Default
config dict[str, Any]

Configuration dictionary with criteria, lm, temperature, system_prompt options.

required
Source code in src/dspydantic/evaluators/score_judge.py
def __init__(self, config: dict[str, Any]) -> None:
    """Initialize ScoreJudge.

    Args:
        config: Configuration dictionary with criteria, lm, temperature, system_prompt options.
    """
    self.config = config
    self.criteria = config.get("criteria", "Rate the quality on a scale of 0-1")
    self.lm = config.get("lm")
    self.temperature = config.get("temperature", 0.0)
    self.system_prompt = config.get("system_prompt")

Functions

evaluate

evaluate(extracted, expected, input_data=None, field_path=None)

Evaluate using LLM-based scoring.

Parameters:

Name Type Description Default
extracted Any

Extracted value.

required
expected Any

Expected value.

required
input_data dict[str, Any] | None

Optional input data for context.

None
field_path str | None

Optional field path for context.

None

Returns:

Type Description
float

Score between 0.0 and 1.0.

Source code in src/dspydantic/evaluators/score_judge.py
def evaluate(
    self,
    extracted: Any,
    expected: Any,
    input_data: dict[str, Any] | None = None,
    field_path: str | None = None,
) -> float:
    """Evaluate using LLM-based scoring.

    Args:
        extracted: Extracted value.
        expected: Expected value.
        input_data: Optional input data for context.
        field_path: Optional field path for context.

    Returns:
        Score between 0.0 and 1.0.
    """
    if self.lm is None:
        # Use default LM from dspy settings
        lm = dspy.settings.lm
        if lm is None:
            raise ValueError("No LM available for ScoreJudge")
    else:
        lm = self.lm

    # Build prompt
    prompt_parts = []
    if self.system_prompt:
        prompt_parts.append(f"System: {self.system_prompt}")

    prompt_parts.append(f"Criteria: {self.criteria}")
    prompt_parts.append(f"\nExpected value: {expected}")
    prompt_parts.append(f"Extracted value: {extracted}")

    if field_path:
        prompt_parts.append(f"\nField: {field_path}")

    if input_data:
        prompt_parts.append(f"\nInput context: {input_data}")

    prompt_parts.append(
        "\nRespond with a JSON object containing a 'score' field (float between 0.0 and 1.0) "
        "and optionally a 'reasoning' field explaining your evaluation."
    )

    prompt = "\n\n".join(prompt_parts)

    # Use DSPy's ChainOfThought
    signature = "prompt -> evaluation"
    judge = dspy.ChainOfThought(signature)
    result = judge(prompt=prompt)

    # Extract evaluation from result
    evaluation_text = str(result.evaluation) if hasattr(result, "evaluation") else str(result)

    # Try to parse JSON from evaluation
    try:
        evaluation = json.loads(evaluation_text)
        score = float(evaluation.get("score", 0.5))
    except (json.JSONDecodeError, ValueError, AttributeError):
        # Try to extract score from text using regex
        score_match = re.search(r'"score"\s*:\s*([0-9.]+)', evaluation_text)
        if score_match:
            try:
                score = float(score_match.group(1))
            except ValueError:
                score = 0.5
        else:
            # Fallback: try to find a number between 0 and 1
            score_match = re.search(r"\b(0\.\d+|1\.0|1)\b", evaluation_text)
            if score_match:
                try:
                    score = float(score_match.group(1))
                except ValueError:
                    score = 0.5
            else:
                score = 0.5

    return max(0.0, min(1.0, score))

LabelModelGrader

LabelModelGrader(config)

Evaluator that uses an LLM to compare categorical labels.

Best for classification fields where labels may have semantic equivalence (e.g., "urgent" vs "high priority") that exact matching would miss.

Parameters:

Name Type Description Default
config dict[str, Any]

Configuration dictionary with: - allowed_labels (list[str]): Valid categorical labels (required) - lm (dspy.LM | None): Custom LM instance (default: uses dspy.settings.lm) - exact_match_score (float): Score for exact matches (default: 1.0) - partial_match_score (float): Score for partial matches (default: 0.5)

required

Raises:

Type Description
ValueError

If allowed_labels is not provided or empty.

Example

import dspy # doctest: +SKIP dspy.configure(lm=dspy.LM("openai/gpt-4o-mini")) # doctest: +SKIP evaluator = LabelModelGrader(config={ # doctest: +SKIP ... "allowed_labels": ["positive", "neutral", "negative"] ... }) evaluator.evaluate("positive", "positive") # Exact match # doctest: +SKIP 1.0 evaluator.evaluate("good", "positive") # Semantic match via LLM # doctest: +SKIP 0.5

Initialize LabelModelGrader.

Parameters:

Name Type Description Default
config dict[str, Any]

Configuration dictionary with allowed_labels, lm, exact_match_score, partial_match_score options.

required
Source code in src/dspydantic/evaluators/label_model_grader.py
def __init__(self, config: dict[str, Any]) -> None:
    """Initialize LabelModelGrader.

    Args:
        config: Configuration dictionary with allowed_labels, lm, exact_match_score,
            partial_match_score options.
    """
    self.config = config
    self.allowed_labels = config.get("allowed_labels", [])
    if not self.allowed_labels:
        raise ValueError("allowed_labels must be provided for LabelModelGrader")
    self.lm = config.get("lm")
    self.exact_match_score = config.get("exact_match_score", 1.0)
    self.partial_match_score = config.get("partial_match_score", 0.5)

Functions

evaluate

evaluate(extracted, expected, input_data=None, field_path=None)

Evaluate using LLM-based label selection.

Parameters:

Name Type Description Default
extracted Any

Extracted value.

required
expected Any

Expected value (should be one of allowed_labels).

required
input_data dict[str, Any] | None

Optional input data for context.

None
field_path str | None

Optional field path for context.

None

Returns:

Type Description
float

Score between 0.0 and 1.0.

Source code in src/dspydantic/evaluators/label_model_grader.py
def evaluate(
    self,
    extracted: Any,
    expected: Any,
    input_data: dict[str, Any] | None = None,
    field_path: str | None = None,
) -> float:
    """Evaluate using LLM-based label selection.

    Args:
        extracted: Extracted value.
        expected: Expected value (should be one of allowed_labels).
        input_data: Optional input data for context.
        field_path: Optional field path for context.

    Returns:
        Score between 0.0 and 1.0.
    """
    if self.lm is None:
        # Use default LM from dspy settings
        lm = dspy.settings.lm
        if lm is None:
            raise ValueError("No LM available for LabelModelGrader")
    else:
        lm = self.lm

    # Convert to strings for comparison
    extracted_str = str(extracted).strip().lower()
    expected_str = str(expected).strip().lower()

    # Check for exact match first
    if extracted_str == expected_str:
        return self.exact_match_score

    # Check if expected is in allowed labels
    expected_lower = [label.lower() for label in self.allowed_labels]
    if expected_str not in expected_lower:
        # Expected label not in allowed labels - use LLM to determine match
        prompt_parts = []
        prompt_parts.append(
            f"Select the best matching label from: {', '.join(self.allowed_labels)}"
        )
        prompt_parts.append(f"\nExpected label: {expected}")
        prompt_parts.append(f"Extracted label: {extracted}")

        if field_path:
            prompt_parts.append(f"\nField: {field_path}")

        if input_data:
            prompt_parts.append(f"\nInput context: {input_data}")

        prompt_parts.append(
            "\nRespond with a JSON object containing a 'label' field (selected label) "
            "and optionally a 'reasoning' field."
        )

        prompt = "\n\n".join(prompt_parts)

        # Use DSPy's ChainOfThought
        signature = "prompt -> label_selection"
        grader = dspy.ChainOfThought(signature)
        result = grader(prompt=prompt)

        # Extract label from result
        label_text = str(result.label_selection) if hasattr(result, "label_selection") else str(result)

        # Try to parse JSON
        try:
            label_data = json.loads(label_text)
            selected_label = str(label_data.get("label", "")).strip().lower()
        except (json.JSONDecodeError, ValueError):
            # Try to find label in text
            selected_label = label_text.strip().lower()
            for label in self.allowed_labels:
                if label.lower() in selected_label:
                    selected_label = label.lower()
                    break

        # Compare selected label with expected
        if selected_label == expected_str:
            return self.exact_match_score
        elif expected_str in selected_label or selected_label in expected_str:
            return self.partial_match_score
        else:
            return 0.0
    else:
        # Expected is in allowed labels, check if extracted matches
        extracted_lower = extracted_str.lower()
        if extracted_lower in expected_lower:
            idx = expected_lower.index(extracted_lower)
            if self.allowed_labels[idx].lower() == expected_str:
                return self.exact_match_score

        # Check for partial match
        for label in self.allowed_labels:
            if expected_str in label.lower() or label.lower() in expected_str:
                if extracted_lower in label.lower() or label.lower() in extracted_lower:
                    return self.partial_match_score

        return 0.0

PythonCodeEvaluator

PythonCodeEvaluator(config)

Evaluator that uses a callable for custom evaluation logic.

Use this when built-in evaluators don't match your requirements, such as domain-specific validation rules or complex business logic.

Parameters:

Name Type Description Default
config dict[str, Any]

Configuration dictionary with: - function (Callable): Function that takes (extracted, expected, input_data, field_path) and returns a float score between 0.0 and 1.0.

required

Raises:

Type Description
ValueError

If 'function' is not provided or not callable.

RuntimeError

If the function raises an exception during evaluation.

Example

def age_evaluator(extracted, expected, input_data=None, field_path=None): ... if extracted == expected: ... return 1.0 ... diff = abs(int(extracted) - int(expected)) ... return max(0.0, 1.0 - (diff / 10)) evaluator = PythonCodeEvaluator(config={"function": age_evaluator}) evaluator.evaluate(30, 30) 1.0 evaluator.evaluate(28, 30) # Off by 2 years 0.8 evaluator.evaluate(20, 30) # Off by 10 years 0.0

Initialize PythonCodeEvaluator.

Parameters:

Name Type Description Default
config dict[str, Any]

Configuration dictionary with 'function' key containing a callable.

required
Source code in src/dspydantic/evaluators/python_code.py
def __init__(self, config: dict[str, Any]) -> None:
    """Initialize PythonCodeEvaluator.

    Args:
        config: Configuration dictionary with 'function' key containing a callable.
    """
    self.config = config
    self.function = config.get("function")

    if self.function is None:
        raise ValueError("'function' must be provided for PythonCodeEvaluator")

    if not callable(self.function):
        raise ValueError("'function' must be a callable")

Functions

evaluate

evaluate(extracted, expected, input_data=None, field_path=None)

Evaluate using the provided callable.

Parameters:

Name Type Description Default
extracted Any

Extracted value.

required
expected Any

Expected value.

required
input_data dict[str, Any] | None

Optional input data for context.

None
field_path str | None

Optional field path for context.

None

Returns:

Type Description
float

Score between 0.0 and 1.0.

Source code in src/dspydantic/evaluators/python_code.py
def evaluate(
    self,
    extracted: Any,
    expected: Any,
    input_data: dict[str, Any] | None = None,
    field_path: str | None = None,
) -> float:
    """Evaluate using the provided callable.

    Args:
        extracted: Extracted value.
        expected: Expected value.
        input_data: Optional input data for context.
        field_path: Optional field path for context.

    Returns:
        Score between 0.0 and 1.0.
    """
    try:
        score = float(
            self.function(extracted, expected, input_data=input_data, field_path=field_path)
        )
        return max(0.0, min(1.0, score))
    except Exception as e:
        raise RuntimeError(f"Error executing Python code evaluator function: {e}") from e

PredefinedScoreEvaluator

PredefinedScoreEvaluator(config=None)

Evaluator that uses pre-computed scores from a list.

This evaluator pops scores from a provided list in order as examples are evaluated. Useful when you already have ground truth scores and don't want to recompute them.

Supports: - Float scores (0.0-1.0): Used directly - Bool values: True → 1.0, False → 0.0 - Numbers: Normalized to 0.0-1.0 range (assumes max is 100 if not specified)

Thread-safe for parallel evaluation using thread-local storage.

Examples:

# Float scores
scores = [0.95, 0.87, 0.92, 1.0, 0.78]
evaluator = PredefinedScoreEvaluator(config={"scores": scores})

# Bool values
bool_scores = [True, False, True, True]
evaluator = PredefinedScoreEvaluator(config={"scores": bool_scores})

# Numbers (normalized)
numeric_scores = [95, 87, 92, 100]
evaluator = PredefinedScoreEvaluator(config={"scores": numeric_scores, "max_value": 100})

Initialize PredefinedScoreEvaluator.

Parameters:

Name Type Description Default
config dict[str, Any] | None

Configuration dictionary with: - "scores": List of scores (float, bool, or numbers) - "max_value": Optional max value for normalization (default: 100)

None
Source code in src/dspydantic/evaluators/predefined_score.py
def __init__(self, config: dict[str, Any] | None = None) -> None:
    """Initialize PredefinedScoreEvaluator.

    Args:
        config: Configuration dictionary with:
            - "scores": List of scores (float, bool, or numbers)
            - "max_value": Optional max value for normalization (default: 100)
    """
    config = config or {}
    self.scores = config.get("scores", [])
    self.max_value = config.get("max_value", 100.0)

    if not isinstance(self.scores, list):
        raise ValueError("scores must be a list")

    # Thread-local storage for tracking which score to use
    self._local = threading.local()

Functions

evaluate

evaluate(extracted, expected, input_data=None, field_path=None)

Evaluate using pre-defined score.

This method ignores extracted/expected values and returns the next pre-defined score from the list.

Parameters:

Name Type Description Default
extracted Any

The extracted value (ignored).

required
expected Any

The expected value (ignored).

required
input_data dict[str, Any] | None

Optional input data (ignored).

None
field_path str | None

Optional field path (ignored).

None

Returns:

Type Description
float

Pre-defined score between 0.0 and 1.0.

Source code in src/dspydantic/evaluators/predefined_score.py
def evaluate(
    self,
    extracted: Any,
    expected: Any,
    input_data: dict[str, Any] | None = None,
    field_path: str | None = None,
) -> float:
    """Evaluate using pre-defined score.

    This method ignores extracted/expected values and returns the next
    pre-defined score from the list.

    Args:
        extracted: The extracted value (ignored).
        expected: The expected value (ignored).
        input_data: Optional input data (ignored).
        field_path: Optional field path (ignored).

    Returns:
        Pre-defined score between 0.0 and 1.0.
    """
    return self._get_next_score()

See Also