Skip to content

Getting Started

Get from zero to optimized extraction in 5 minutes.


Prerequisites

pip install dspydantic

Set your API key:

export OPENAI_API_KEY="sk-..."

Step 1: Define Your Model

Create a Pydantic model describing what you want to extract:

from pydantic import BaseModel, Field
from typing import Literal

class JobPosting(BaseModel):
    """Extract structured data from job postings."""
    title: str = Field(description="Job title")
    company: str = Field(description="Company name")
    location: str = Field(description="Job location")
    salary_range: str | None = Field(description="Salary range if mentioned")
    experience_years: str | None = Field(description="Required years of experience")
    employment_type: Literal["full_time", "part_time", "contract", "internship"] = Field(
        description="Type of employment"
    )
    remote: bool = Field(description="Whether remote work is available")
    skills: list[str] = Field(description="Required skills or technologies")

Tips:

  • Field descriptions guide the LLM—be specific
  • Use Literal for categorical fields
  • Use | None for optional fields
  • Lists work for multi-value fields

Step 2: Create Examples

Provide examples of input text and expected output:

from dspydantic import Example

examples = [
    Example(
        text="""
        Senior Software Engineer at TechCorp

        Location: San Francisco, CA (Hybrid - 3 days onsite)
        Salary: $180,000 - $220,000

        We're looking for an experienced engineer with 5+ years of experience 
        in Python and cloud infrastructure. Strong background in AWS, Kubernetes, 
        and CI/CD pipelines required.

        Full-time position with competitive benefits.
        """,
        expected_output={
            "title": "Senior Software Engineer",
            "company": "TechCorp",
            "location": "San Francisco, CA",
            "salary_range": "$180,000 - $220,000",
            "experience_years": "5+ years",
            "employment_type": "full_time",
            "remote": True,
            "skills": ["Python", "AWS", "Kubernetes", "CI/CD"]
        }
    ),
    Example(
        text="""
        Data Analyst Intern - FinanceHub

        NYC Office, No Remote

        3-month internship for current students. Must know SQL and Excel.
        Experience with Tableau is a plus.
        """,
        expected_output={
            "title": "Data Analyst Intern",
            "company": "FinanceHub",
            "location": "NYC Office",
            "salary_range": None,
            "experience_years": None,
            "employment_type": "internship",
            "remote": False,
            "skills": ["SQL", "Excel", "Tableau"]
        }
    ),
    Example(
        text="""
        Contract DevOps Engineer

        RemoteFirst Inc. | 100% Remote | $85-95/hr

        6-month contract. Looking for someone with 3 years experience in 
        Terraform, Docker, and GitHub Actions. Azure certification preferred.
        """,
        expected_output={
            "title": "Contract DevOps Engineer",
            "company": "RemoteFirst Inc.",
            "location": "100% Remote",
            "salary_range": "$85-95/hr",
            "experience_years": "3 years",
            "employment_type": "contract",
            "remote": True,
            "skills": ["Terraform", "Docker", "GitHub Actions", "Azure"]
        }
    ),
]

How many examples?

  • 5-10: Good for simple models
  • 10-20: Recommended for most cases
  • 20+: For complex schemas or edge cases

Step 3: Optimize

from dspydantic import Prompter

prompter = Prompter(
    model=JobPosting,
    model_id="openai/gpt-4o-mini",
)

result = prompter.optimize(examples=examples, verbose=True)

Optimization takes 1-5 minutes depending on example count. With verbose=True, you'll see real-time progress with: - Rich-formatted headers showing your model and optimization mode - Field-by-field optimization progress with improved/unchanged indicators - The actual optimized descriptions and prompts - Final summary table with scores and API costs

See Configure Optimizations for options like sequential, early_stopping_patience, auto_generate_prompts, compile_kwargs, max_val_examples, include_fields, exclude_fields, and custom progress callbacks.


Step 4: Check Results

print(f"Before: {result.baseline_score:.0%}")
print(f"After:  {result.optimized_score:.0%}")
print(f"API calls: {result.api_calls}")
print(f"Tokens: {result.total_tokens:,}")

Typical output:

Before: 72%
After:  91%
API calls: 47
Tokens: 28,450

View optimized descriptions:

for field, desc in result.optimized_descriptions.items():
    print(f"{field}: {desc}")

Step 5: Extract

Use your optimized prompter:

job = prompter.run("""
    ML Engineer - AI Startup

    Boston, MA or Remote
    $150K-200K base + equity

    Join our team building next-gen recommendation systems. 
    Need 4+ years with PyTorch, transformers, and production ML.
    Full-time. Start immediately.
""")

print(job)
# JobPosting(
#     title='ML Engineer',
#     company='AI Startup',
#     location='Boston, MA or Remote',
#     salary_range='$150K-200K base + equity',
#     experience_years='4+ years',
#     employment_type='full_time',
#     remote=True,
#     skills=['PyTorch', 'transformers', 'production ML']
# )

Step 6: Save for Production

# Save the optimized prompter
prompter.save("./job_parser")

# Later, in production:
prompter = Prompter.load(
    "./job_parser",
    model=JobPosting,
    model_id="openai/gpt-4o-mini"
)

job = prompter.run(new_posting_text)

Quick Reference

Method Purpose
Prompter(model, model_id) Create prompter
prompter.optimize(examples) Optimize with examples
prompter.run(text) Extract from text
prompter.predict_batch(texts) Batch extraction
prompter.save(path) Save optimized state
Prompter.load(path, model, model_id) Load saved prompter

Next Steps

Topic Guide
Different input types Modalities
Images and PDFs Modalities - Images/PDFs
Customize evaluation Configure Evaluators
Complex schemas Nested Models
Production deployment Save and Load
Integration patterns Integration Patterns

Troubleshooting

Low accuracy after optimization?

  • Add more diverse examples
  • Check that examples are correct
  • Try a more capable model (gpt-4o vs gpt-4o-mini)

Optimization takes too long?

  • Reduce example count for initial testing
  • Use gpt-4o-mini for faster iterations
  • Use compile_kwargs={"num_trials": 5} to limit MiPROv2 trials
  • Use early_stopping_patience=2 in sequential mode

API key issues?

# Set key explicitly
import dspy
dspy.configure(lm=dspy.LM("openai/gpt-4o-mini", api_key="sk-..."))

See Configure Models for more options.