Skip to main content

DSPy Tutorial: Structured Output with Adapters

·11 mins

Part of the DSPy series. This tutorial shows how to generate structured outputs that parse reliably. If you haven’t read the classification tutorial, start there for the DSPy fundamentals.

The markdown wrapper problem #

When you prompt an LLM to generate SQL, JSON or code, it often wraps the output in markdown formatting like sql code blocks. The response looks like this:

```sql
SELECT * FROM customers WHERE city = 'New York'
```

Or adds explanatory text before and after:

Here's the SQL query you requested:

```sql
SELECT * FROM customers WHERE city = 'New York'
```

This query selects all customers from New York.

You want just the SQL string. No wrappers, no commentary, no parsing. When you’re processing hundreds or thousands of queries, you can’t afford to write unreliable regex patterns to strip formatting. And as soon as the model changes its output style, your regex patterns will break.

DSPy’s adapters solve this by enforcing output format at the framework level. You define a Pydantic model with the exact fields you need. DSPy instructs the model to return JSON, validates the structure and hands you a typed Python object. No string parsing, no markdown stripping, no retry loops for malformed outputs.

In this tutorial, I build a SQL query generator that takes natural language questions and produces clean, structured queries. I’ll show how Pydantic models integrate with DSPy signatures, how to measure parsing success and where validation fits in the optimization loop.


The SQL generation task #

Problem: Convert natural language questions into SQL queries for an e-commerce database.

Schema:

  • customers (customer_id, name, email, city)
  • products (product_id, name, price, stock_quantity, category)
  • orders (order_id, customer_id, product_id, quantity, total_amount, order_date, discount)

Example:

  • Input: “Show me all customers from New York”
  • Output: SELECT * FROM customers WHERE city = 'New York'

This is more complex than classification. Instead of picking from a fixed set of categories, the model needs to generate arbitrary SQL that matches the schema and answers the question correctly.

The code is organized as follows:

dspy-02-structured-output/
├── data/
│   ├── train.json         # 20 labeled examples
│   └── test.json          # 10 holdout examples
├── sql_generator.py       # Pydantic model + signature
├── optimize.py            # Runs BootstrapFewShot optimizer
└── eval.py                # Measures parsing success

Step 1: Define the Pydantic model #

A Pydantic model specifies the exact structure of the output. DSPy’s JSONAdapter uses this to instruct the LLM and validate responses.

from pydantic import BaseModel, Field

class SQLQuery(BaseModel):
    """Structured SQL query with explanation."""

    query: str = Field(
        description="The SQL query without markdown formatting"
    )
    explanation: str = Field(
        description="Brief explanation of what the query does"
    )

Key design choices:

  • Two fields: query holds the raw SQL string, explanation provides context. This gives both the executable query and human-readable reasoning.
  • Field descriptions: Pydantic’s Field adds descriptions that DSPy includes in the prompt. This guides the model on what each field should contain.
  • No markdown: The description explicitly states “without markdown formatting”. While adapters handle format enforcement, clear descriptions help the model understand expectations.

Including the explanation field is helpful for debugging and transparency. When a generated query looks suspicious, the explanation helps understand the model’s reasoning. In production, you might log explanations for audit trails or show them to users for transparency.


Step 2: Define the signature with schema context #

import dspy

class GenerateSQL(dspy.Signature):
    """Generate a SQL query for an e-commerce database.
    
    The database has three tables:
    - customers (customer_id, name, email, city)
    - products (product_id, name, price, stock_quantity, category)
    - orders (order_id, customer_id, product_id, quantity, 
              total_amount, order_date, discount)
    """

    question: str = dspy.InputField(
        desc="Natural language question about the data"
    )
    result: SQLQuery = dspy.OutputField(
        desc="SQL query and explanation"
    )

What this does:

  • Schema in the docstring: The model needs to know what tables and columns exist. Including the schema in the signature’s docstring makes it part of every prompt automatically. (this could be filled dynamically from a SQLAlchemy Schema)
  • SQLQuery as output type: DSPy sees that result is a Pydantic model and coordinates with JSONAdapter to enforce the structure.
  • Type safety: The output is a SQLQuery instance, not a string or dict. You access result.query and result.explanation with full type hints.

Comparison to string prompting:

With string-based prompts, you’d manually format the schema, hope the model returns parseable JSON and write error handling for when it doesn’t. With DSPy, the schema lives in the signature and the adapter handles format enforcement.


Step 3: Configure DSPy with JSONAdapter #

The JSONAdapter is what makes Pydantic integration work. It tells DSPy to request JSON from the model and parse responses back into Python types.

from dspy.adapters.json_adapter import JSONAdapter
from dotenv import load_dotenv

load_dotenv()

def configure_dspy():
    lm = dspy.LM(
        model="anthropic/claude-3-haiku-20240307-v1:0",
        max_retries=2,
        timeout=60,
    )
    dspy.configure(lm=lm, adapter=JSONAdapter())

What JSONAdapter does:

  1. Prompt construction: Adds instructions for the model to output JSON matching the Pydantic schema.
  2. Field inclusion: Includes Pydantic field descriptions in the prompt so the model knows what each field represents.
  3. Response parsing: Extracts JSON from the model’s response and validates it against the Pydantic model.
  4. Error handling: Raises clear exceptions if the JSON is invalid or doesn’t match the schema.

Behind the scenes:

graph TD A["generator(question=...)"] --> B["Signature + Pydantic model
define expected structure"] B --> C["JSONAdapter builds prompt:
- Schema instructions
- Field descriptions
- JSON format requirement"] C --> D["LLM generates JSON response"] D --> E["JSONAdapter parses JSON"] E --> F{"Valid Pydantic
model?"} F -->|Yes| G["Return Prediction with
SQLQuery instance"] F -->|No| H["Raise validation error"] style A fill:#e0f2fe,stroke:#0ea5e9,stroke-width:2px,color:#0c4a6e style B fill:#e0f2fe,stroke:#0ea5e9,stroke-width:2px,color:#0c4a6e style C fill:#ffedd5,stroke:#f97316,stroke-width:2px,color:#7c2d12 style D fill:#e0f2fe,stroke:#0ea5e9,stroke-width:2px,color:#0c4a6e style E fill:#e0f2fe,stroke:#0ea5e9,stroke-width:2px,color:#0c4a6e style F fill:#ffedd5,stroke:#f97316,stroke-width:2px,color:#7c2d12 style G fill:#d1fae5,stroke:#10b981,stroke-width:2px,color:#064e3b style H fill:#ffe4e6,stroke:#f43f5e,stroke-width:2px,color:#881337 linkStyle default stroke:#576679,stroke-width:2px

The adapter is the bridge between DSPy’s signature system and the LLM’s JSON generation capabilities. Without it, you’d need to manually craft JSON format instructions and write parsing logic for every signature.


Step 4: Create the baseline generator #

Create the generator with dspy.Predict:

generator = dspy.Predict(GenerateSQL)

Using the generator:

response = generator(question="Show me all customers from New York")
print(response.result.query)
# SELECT * FROM customers WHERE city = 'New York'

print(response.result.explanation)
# This query selects all columns from the customers table and 
# filters for customers located in New York using the city field.

The output is a typed SQLQuery instance with no markdown formatting. Contrast with string prompting:

# String-based approach
prompt = f"""Generate SQL for: {question}
Schema: customers (customer_id, name, email, city)
Return JSON with 'query' and 'explanation' fields. 
Do not include any markdown formatting."""

response = litellm.completion(model="...", messages=[...])
content = response.choices[0].message.content

# Now you need to:
# 1. Strip potential markdown: ```json ... ```
# 2. Parse JSON and handle errors
# 3. Validate both fields exist
# 4. Handle edge cases (extra fields, wrong types)

import json
import re

# Strip markdown
cleaned = re.sub(r'```json\s*|\s*```', '', content)
try:
    data = json.loads(cleaned)
    query = data['query']
    explanation = data['explanation']
except (json.JSONDecodeError, KeyError):
    # Retry logic or error handling
    pass

DSPy eliminates all of this. The adapter handles format enforcement and Pydantic validates the structure.


Step 5: Prepare the dataset #

data/train.json (20 examples):

[
  {
    "question": "Show me all customers from New York",
    "query": "SELECT * FROM customers WHERE city = 'New York'",
    "explanation": "This query selects all columns from the customers 
                    table and filters for customers located in New York 
                    using the city field."
  },
  {
    "question": "What are the total sales for each product?",
    "query": "SELECT product_id, SUM(quantity * price) as total_sales 
              FROM orders GROUP BY product_id",
    "explanation": "This query calculates total sales by multiplying 
                    quantity and price for each order, then groups by 
                    product_id to get sales per product."
  }
]

Loading the data:

import json
from pathlib import Path
from typing import Literal
import dspy
from sql_generator import SQLQuery

def load_dataset(split: Literal["train", "test"]) -> list[dspy.Example]:
    data_dir = Path(__file__).parent / "data"
    file_path = data_dir / f"{split}.json"

    with open(file_path) as f:
        data = json.load(f)

    return [
        dspy.Example(
            question=item["question"],
            result=SQLQuery(
                query=item["query"],
                explanation=item["explanation"],
            ),
        ).with_inputs("question")
        for item in data
    ]

The result field is a SQLQuery instance matching the signature’s output type, allowing DSPy to compare generated outputs against expected outputs structurally.


Step 6: Define the metric #

The metric measures whether the generated output successfully parses as a valid SQLQuery instance with non-empty fields.

def parsing_metric(example: dspy.Example, pred, trace=None):
    """
    Metric that returns 1.0 if the output successfully parses as
    SQLQuery, 0.0 otherwise.
    
    This validates that:
    - The output is valid JSON
    - It matches the Pydantic schema
    - Both query and explanation fields are present
    """
    try:
        # Check if pred.result is a valid SQLQuery instance
        if not isinstance(pred.result, SQLQuery):
            return 0.0
        
        # Verify both fields are non-empty strings
        if not pred.result.query or not pred.result.explanation:
            return 0.0
        
        return 1.0
    except Exception:
        return 0.0

What this measures:

This is a parsing success metric. It answers: “Did the model produce output that matches my required structure?”

What this doesn’t measure:

  • SQL correctness: Whether the query is syntactically valid SQL.
  • Logical correctness: Whether the query actually answers the question.

Why start with parsing?

Before you can validate SQL syntax or execute queries, you need to ensure the output parses. This metric establishes the baseline: can the model follow the structure at all?

Adding stricter validation:

You can extend this metric to check SQL syntax (using sqlparse) or execution (using an in memory DuckDB). Since DSPy expects a single metric function, you can combine multiple checks in one function. You can use strict validation (all checks must pass) or weighted scoring (partial credit for each level). For this tutorial, I stick with parsing-only since it’s fast, doesn’t require external libraries and validates the most important property, structured output that matches the Pydantic schema.


Step 7: Run the baseline evaluation #

Measure baseline performance before optimizing:

import dspy
from dspy.evaluate import Evaluate
from optimize import load_dataset, parsing_metric
from sql_generator import GenerateSQL, configure_dspy

configure_dspy()
testset = load_dataset("test")

evaluator = Evaluate(
    devset=testset,
    metric=parsing_metric,
    display_progress=True,
)

baseline_generator = dspy.Predict(GenerateSQL)
baseline_score = evaluator(baseline_generator)

print(f"Baseline Parsing Success: {baseline_score:.1%}")

Output:

Average Metric: 7.00 / 10 (70.0%): 100%|██████████| 10/10
Baseline Parsing Success: 70.0%

The baseline achieves 70% parsing success without few-shot examples. While JSONAdapter enforces structure, the model still struggles with consistent formatting on some queries. Few-shot examples help improve this.

Note: I used claude-3-haiku (older, weaker model) for this tutorial. Sonnet and Haiku 4.5 achieve 100% on this simple task.


Step 8: Optimize and evaluate #

The optimizer works like in the previous tutorial, but this time with a more complex task. Comparing baseline and optimized performance shows:

optimized_generator = dspy.Predict(GenerateSQL)  # at this point it's not yet optimized
optimized_generator.load("sql_generator_compiled.json")  # now it's optimized
optimized_score = evaluator(optimized_generator)

Output:

Average Metric: 8.00 / 10 (80.0%): 100%|██████████| 10/10
Baseline:  70.0%
Optimized: 80.0%
Change:    +10.0%

Few-shot examples improved parsing success from 70% to 80%, demonstrating more consistent structured output generation.


Understanding the compiled output #

The compiled JSON contains selected few-shot demonstrations with full SQLQuery objects:

{
  // ...
  "demos": [
    {
      "augmented": true,
      "question": "Show me all customers from New York",
      "result": {
        "query": "SELECT * FROM customers WHERE city = 'New York'",
        "explanation": "This query selects all columns from..."
      }
    }
  ],
  "signature": {
    "instructions": "Generate a SQL query for an e-commerce database...",
    "fields": [
      {
        "prefix": "Question:",
        "description": "Natural language question about the data"
      },
      {
        "prefix": "Result:",
        "description": "SQL query and explanation"
      }
    ]
  }
}

In production, this JSON is used it like any other DSPy module. The demonstrations are included, improving consistency without requiring the training data to be shared.


Why Pydantic over plain JSON? #

You could use plain Python dicts instead of Pydantic models. Here’s why Pydantic is better:

Aspect Plain Dict Pydantic Model
Type Safety No type hints Full type hints with IDE support
Validation Manual checking Automatic validation
Field Docs Comments only Field descriptions in schema
Nested Structures Nested dicts Nested models with validation
Defaults Manual handling Built-in default values

Example with nested structure:

from pydantic import BaseModel

class Column(BaseModel):
    name: str
    table: str

class SQLQueryWithMetadata(BaseModel):
    query: str
    explanation: str
    columns_used: list[Column]

This lets you generate structured metadata alongside the SQL query. DSPy + Pydantic handle the entire nested structure automatically.


Validation strategies: Where does validation happen? #

There are three places you can validate outputs:

1. Adapter level (automatic)

  • What: JSONAdapter validates JSON structure and Pydantic schema.
  • When: Every LLM call.
  • Pros: Automatic, no code needed.
  • Cons: Only validates structure, not content correctness.

2. Metric level (during optimization)

  • What: Your metric function checks parsing, syntax or logic. Can combine multiple checks into one metric function.
  • When: During optimization and evaluation.
  • Pros: Guides optimizer toward better outputs.
  • Cons: Only runs during optimization, not in production.

3. Application level (production)

  • What: Post-processing checks before executing queries.
  • When: Every production request.
  • Pros: Catches issues before they affect users.
  • Cons: Adds latency, requires error handling.

Recommended approach:

Use all three layers:

  • Let the adapter handle structural validation automatically.
  • Use metrics to guide optimization toward quality outputs.
  • Add application-level checks for critical operations (executing SQL, making API calls).
# Production example with validation
def generate_and_validate(question: str) -> SQLQuery:
    response = generator(question=question)
    
    # Application-level validation
    try:
        # Check SQL syntax
        parsed = sqlparse.parse(response.result.query)
        if not parsed:
            raise ValueError("Invalid SQL syntax")
        
        # Check for dangerous operations
        query_upper = response.result.query.upper()
        if any(keyword in query_upper for keyword in 
               ['DROP', 'DELETE', 'TRUNCATE']):
            raise ValueError("Dangerous SQL operation detected")
        
        return response.result
    except Exception as e:
        # Log error, use fallback or retry
        logger.error("SQL validation failed", error=str(e))
        raise

This layered approach gives you both development-time optimization and production-time safety.


Key takeaways #

Pydantic models enforce structure. Instead of parsing strings and hoping the format is correct, define a typed model and let DSPy handle format enforcement.

JSONAdapter eliminates parsing code. No regex to strip markdown, no manual JSON parsing, no retry loops.

Validation happens in layers. Start with a parsing metric to ensure output matches your structure, then layer on domain-specific checks like SQL syntax or logical correctness. Each layer serves a purpose: adapter validates structure, metrics guide optimization, and application-level checks provide production safety.

Pydantic scales to complexity. Nested models, optional fields, defaults and custom validation all work seamlessly with DSPy.