Extract Free-form Text¶
Learn how to extract unstructured text output without defining a Pydantic model.
When You'd Use This¶
Not every extraction task needs a rigid schema. Sometimes you want:
- Sentiment analysis — just a sentiment label or score, not a structured object
- Text summarization — a free-form summary of a document
- Open-ended Q&A — answers that vary in structure
- Classification — assigning one of several text categories
Instead of a Pydantic model with typed fields, you optimize a single text output using model=None.
Step 1: Create Examples¶
Prepare examples with string expected_output:
from dspydantic import Example
examples = [
Example(
text="The movie was absolutely brilliant. Amazing performances and a gripping story.",
expected_output="positive"
),
Example(
text="Terrible acting, boring plot, waste of time.",
expected_output="negative"
),
Example(
text="It was okay. Some good parts, some slow scenes.",
expected_output="neutral"
),
Example(
text="A masterpiece! Every scene was perfect. Highly recommend it.",
expected_output="positive"
),
Example(
text="Not what I expected. Didn't really enjoy it.",
expected_output="negative"
),
]
Tips:
- Every example must have expected_output as a string, not a dict
- Use multiple examples (5-20 recommended) to teach the model your desired output format
- Examples should be representative of real data you'll process
Step 2: Configure and Optimize¶
import dspy
from dspydantic import Prompter
# Configure language model (see Configure a Language Model tutorial)
dspy.configure(lm=dspy.LM("openai/gpt-4o-mini", api_key="your-api-key"))
# Create prompter with model=None
prompter = Prompter(model=None)
# Optimize with your examples
result = prompter.optimize(examples=examples)
print(f"Baseline accuracy: {result.baseline_score:.0%}")
print(f"Optimized accuracy: {result.optimized_score:.0%}")
Output example:
The optimizer improved the field description and system/instruction prompts to better guide the model toward your desired output format.
Step 3: Extract¶
Use your optimized prompter to extract from new text:
data = prompter.run("""
I loved this film. The director did an amazing job,
and the cinematography was stunning.
""")
print(data.output) # "positive"
Access the result with .output — that's the string you optimized for.
With Images and PDFs¶
The same pattern works with images and PDFs. Just use the appropriate input format:
Images¶
examples = [
Example(image_path="digit_5.png", expected_output="5"),
Example(image_path="digit_3.png", expected_output="3"),
Example(image_path="digit_7.png", expected_output="7"),
]
prompter = Prompter(model=None)
result = prompter.optimize(examples=examples)
digit = prompter.run(image_path="new_digit.png")
print(digit.output) # e.g., "5"
PDFs¶
examples = [
Example(pdf_path="invoice_001.pdf", expected_output="INV-2024-001"),
Example(pdf_path="invoice_002.pdf", expected_output="INV-2024-002"),
]
prompter = Prompter(model=None)
result = prompter.optimize(examples=examples)
invoice = prompter.run(pdf_path="new_invoice.pdf")
print(invoice.output) # e.g., "INV-2024-003"
How It Works¶
Under the hood, DSPydantic creates a minimal internal schema with a single field "output" (a string field). When you optimize, it:
- Optimizes the field description for
"output"to clarify what you want - Optimizes the system and instruction prompts
- Tests against your examples to measure accuracy
When you call prompter.run(...), the model generates text and DSPydantic extracts the string result.
What Gets Optimized¶
| What | Impact |
|---|---|
"output" field description |
High — describes the desired output format |
| System/instruction prompts | Medium — guides the overall extraction behavior |
Tips¶
- Use consistent output formats across examples (e.g., always "positive", "negative", "neutral" for sentiment)
- When output should be numeric or structured, consider using a Pydantic model instead (see Extract Structured Data)
- For complex multi-step reasoning, use Prompt Templates for dynamic prompts
- To save your optimized prompter for production, see Save and Load a Prompter
Next Steps¶
| Topic | Guide |
|---|---|
| Structured data with types | Extract Structured Data |
| Dynamic prompts with placeholders | Optimize with Prompt Templates |
| Images and PDFs in detail | Use Images and PDFs |
| Production deployment | Save and Load a Prompter |
| Customize evaluation | Configure Evaluators |