Architecture¶
This page explains the internal architecture of DSPydantic, how components interact, and design decisions.
Overview¶
DSPydantic is built on top of DSPy and provides a high-level interface for optimizing Pydantic models and prompts. The architecture consists of several key components:
flowchart LR
A[User Examples] --> B[Prompter]
B --> C[DSPy Optimizer]
C --> D[Generate Variations]
D --> E[Test Descriptions]
D --> F[Test Prompts]
E --> G[Evaluators]
F --> G
G --> H[Score Results]
H --> I[Select Best]
I --> J[Optimized Prompter]
J --> K[Extract Data]
Architecture Components¶
| Component | Responsibility | Key Functions |
|---|---|---|
| Prompter | Main interface | Optimize descriptions and prompts, extract data |
| DSPy Module | Optimization logic | Generate variations, handle prompts |
| DSPy Optimizer | Algorithm selection | Choose and run optimization algorithm |
| Evaluators | Performance measurement | Score variations, guide selection |
| Extractor | Field extraction | Extract descriptions, create optimized models |
Core Components¶
Prompter¶
The Prompter class provides a unified interface for both optimization and extraction. It:
- Optimizes field descriptions and prompts
- Extracts structured data from text, images, and PDFs
- Provides save/load functionality
- Manages the optimization process
- Returns optimization results
DSPy Module¶
The Prompter uses an internal DSPy module that:
- Takes field descriptions as input
- Generates optimized descriptions as output
- Handles system and instruction prompts
- Works with DSPy optimizers
Evaluators¶
The evaluator system provides flexible evaluation:
- Field-level evaluation
- Multiple evaluator types
- Configurable per field
- Extensible via custom evaluators
Extractor¶
The extractor utilities handle:
- Extracting field descriptions from models
- Creating optimized model classes
- Applying optimized descriptions to schemas
- Handling nested models
Data Flow¶
Optimization Flow¶
- User provides model and examples
Prompterextracts field descriptions from model- Creates DSPy module with descriptions
- DSPy optimizer tests variations of descriptions and prompts
- Evaluators score each variation
- Best variations are selected
- Results are returned
Extraction Flow¶
- User calls
prompter.run()with input (text/image/PDF) Prompterprepares input data- Creates optimized model with descriptions
- Uses optimized prompts
- Calls LLM with structured output format
- Parses response into Pydantic model
- Returns extracted data
Design Decisions¶
Why DSPy?¶
DSPy provides:
- Proven optimization algorithms
- Active research and development
- Good community support
- Flexible architecture
Why Pydantic?¶
Pydantic provides:
- Type validation
- JSON schema generation
- Structured output support
- Wide adoption
Why Unified Prompter?¶
The Prompter class provides:
- Simpler API for common use cases
- Production-ready save/load
- Consistent interface
- Backward compatibility
Why Evaluator System?¶
The evaluator system provides:
- Flexibility for different use cases
- Per-field customization
- Extensibility
- Clear separation of concerns
Module Structure¶
dspydantic/
├── __init__.py # Public API
├── prompter.py # Prompter class (optimize/run)
├── optimizer.py # Internal optimizer (used by Prompter)
├── module.py # DSPy module
├── extractor.py # Field extraction utilities
├── types.py # Core types
├── persistence.py # Save/load functionality
├── utils.py # Utility functions
└── evaluators/ # Evaluator system
├── __init__.py
├── config.py # Evaluator configuration
├── functions.py # Evaluation functions
└── *.py # Individual evaluators
Extension Points¶
Custom Evaluators¶
Implement BaseEvaluator protocol:
class MyEvaluator:
def __init__(self, config: dict) -> None:
self.config = config
def evaluate(self, extracted, expected, input_data=None, field_path=None) -> float:
# Your evaluation logic here
return 1.0 if extracted == expected else 0.0
Custom Optimizers¶
Pass DSPy optimizer instances:
from dspy.teleprompt import MIPROv2
custom_optimizer = MIPROv2(...)
prompter = Prompter(model=MyModel, model_id="gpt-4o")
result = prompter.optimize(examples=examples, optimizer=custom_optimizer)
Performance Considerations¶
Optimization¶
- Uses multiple threads by default
- Can be configured with
num_threads - Caches intermediate results
- Parallelizes evaluation
Extraction¶
- Single API call per extraction
- No caching (stateless)
- Efficient schema generation
- Minimal overhead
Future Directions¶
Potential improvements:
- Caching optimization results
- Batch extraction support
- More evaluator types
- Better error handling
- Performance optimizations