The Bottom Line
Getting reliable, structured data from LLMs has evolved from a frustrating prompt-engineering exercise into a solved problem—if you know the right techniques.
Without constraints, asking an LLM for JSON fails 30-70% of the time on complex schemas. With native structured outputs, you get 100% schema compliance. The difference? Understanding when to use guaranteed constraints versus best-effort approaches.
Why This Matters
Modern applications need LLMs to power APIs, populate databases, and integrate with existing systems. A JSON parsing failure at 2 AM cascades into customer-facing outages.
The core problem: LLMs generate text token-by-token based on probability distributions. They're trained to produce helpful, conversational responses—not machine-parseable data structures.
Two fundamental approaches exist:
| Approach | Guarantee | Best For |
|---|---|---|
| Guaranteed constraints | 100% schema compliance | Production systems |
| Best-effort constraints | ~70-85% compliance | Prototyping, models without native support |
Guaranteed constraints modify token generation itself—invalid tokens are mathematically impossible. Best-effort approaches guide through prompting and hope the model complies.
Provider-Native Structured Outputs
All three major AI providers now offer built-in structured output features. Here's how they compare:
Comparison Table
| Feature | OpenAI | Anthropic | Gemini |
|---|---|---|---|
| Release date | Aug 2024 | Nov 2025 | Nov 2025 (enhanced) |
Union types (anyOf) | No | No | Yes |
| Recursive schemas | No | No | Yes |
| Numeric constraints | No | No | Yes |
| Property ordering | No | No | Yes (Gemini 2.5+) |
| Streaming support | Yes | Yes | Yes |
OpenAI: Most Mature
OpenAI reports 100% schema compliance vs ~35% with prompting alone. Works through constrained decoding—the API masks out tokens that would violate your schema.
[object Object], openai ,[object Object], OpenAI
,[object Object], pydantic ,[object Object], BaseModel
,[object Object], ,[object Object],(,[object Object],):
name: ,[object Object],
date: ,[object Object],
participants: ,[object Object],[,[object Object],]
client = OpenAI()
completion = client.beta.chat.completions.parse(
model=,[object Object],,
messages=[
{,[object Object],: ,[object Object],, ,[object Object],: ,[object Object],},
{,[object Object],: ,[object Object],, ,[object Object],: ,[object Object],}
],
response_format=CalendarEvent
)
event = completion.choices[,[object Object],].message.parsed ,[object Object],Limitations: No anyOf/oneOf, no recursive schemas, no numeric constraints. All fields must be required.
Anthropic Claude: Newer but Capable
Released November 2025 as public beta. Uses compiled grammar artifacts for enforcement.
[object Object], anthropic ,[object Object], Anthropic
client = Anthropic()
response = client.beta.messages.create(
model=,[object Object],,
betas=[,[object Object],],
max_tokens=,[object Object],,
messages=[
{,[object Object],: ,[object Object],, ,[object Object],: ,[object Object],}
],
output_format={
,[object Object],: ,[object Object],,
,[object Object],: {
,[object Object],: ,[object Object],,
,[object Object],: {
,[object Object],: {,[object Object],: ,[object Object],},
,[object Object],: {,[object Object],: ,[object Object],},
,[object Object],: {,[object Object],: ,[object Object],}
},
,[object Object],: [,[object Object],, ,[object Object],, ,[object Object],]
}
}
)Note: Requires beta header (anthropic-beta: structured-outputs-2025-11-13).
Google Gemini: Most Flexible
Most advanced JSON Schema support. Unique features: anyOf for union types, $ref for recursive schemas, minimum/maximum for numeric constraints.
[object Object], google ,[object Object], genai
,[object Object], pydantic ,[object Object], BaseModel
,[object Object], ,[object Object],(,[object Object],):
recipe_name: ,[object Object],
ingredients: ,[object Object],[,[object Object],]
instructions: ,[object Object],[,[object Object],]
client = genai.Client()
response = client.models.generate_content(
model=,[object Object],,
contents=,[object Object],,
config={
,[object Object],: ,[object Object],,
,[object Object],: Recipe.model_json_schema()
}
)7 Techniques for Constraining Output
1. JSON Mode vs Structured Outputs
Don't confuse them:
| Mode | Guarantees |
|---|---|
| JSON mode | Valid JSON syntax only |
| Structured outputs | Valid JSON AND exact schema compliance |
[object Object],
response_format={,[object Object],: ,[object Object],}
,[object Object],
response_format={,[object Object],: ,[object Object],, ,[object Object],: {...}}2. Schema Design That Works
Well-designed schemas dramatically improve reliability:
[object Object], pydantic ,[object Object], BaseModel, Field
,[object Object], typing ,[object Object], ,[object Object],, ,[object Object],
,[object Object], enum ,[object Object], Enum
,[object Object], ,[object Object],(,[object Object],, Enum):
positive = ,[object Object],
negative = ,[object Object],
neutral = ,[object Object],
,[object Object], ,[object Object],(,[object Object],):
,[object Object],
product_name: ,[object Object], = Field(
description=,[object Object],
)
sentiment: SentimentLevel = Field(
description=,[object Object],
)
rating_inferred: ,[object Object],[,[object Object],] = Field(
default=,[object Object],,
description=,[object Object],
)
key_points: ,[object Object],[,[object Object],] = Field(
description=,[object Object],,
max_length=,[object Object],
)Key principles:
- Use descriptive field names
- Add
descriptionattributes to clarify ambiguous fields - Use enums or
Literaltypes to constrain categories - Keep nesting shallow—deeply nested schemas have higher failure rates
3. Regex Constraints
For simple patterns like emails, dates, classifications:
[object Object], outlines
model = outlines.models.transformers(,[object Object],)
,[object Object],
classifier = outlines.generate.regex(model, ,[object Object],)
sentiment = classifier(,[object Object],)
,[object Object],4. Grammar-Based Constraints
For nested structures, recursion, and code generation:
[object Object],
grammar = ,[object Object],5. Few-Shot Examples
When you can't use constrained decoding:
Extract product information as JSON.
Example 1:
Input: "iPhone 15 Pro - $999, 256GB storage, Space Black"
Output: {"name": "iPhone 15 Pro", "price": 999, "storage": "256GB", "color": "Space Black"}
Example 2:
Input: "Samsung Galaxy S24 Ultra priced at $1199 with 512GB"
Output: {"name": "Samsung Galaxy S24 Ultra", "price": 1199, "storage": "512GB", "color": null}
Now extract:
Input: "OnePlus 12 - 256GB Flowy Emerald edition for $799"
Output:Best practices: Use 2-5 examples. Cover edge cases. Place the most important example last.
6. Explicit Format Instructions
Clear, specific instructions significantly improve compliance:
system_prompt = ,[object Object],7. Why Negative Constraints Often Fail
"Don't do X" instructions can make unwanted behavior more likely (the "pink elephant problem"):
[object Object],
prompt = ,[object Object],
,[object Object],
prompt = ,[object Object],Tools That Make This Practical
| Tool | Best For | Monthly Downloads |
|---|---|---|
| Instructor | API models, multi-provider | 3M+ |
| Outlines | Local models, guaranteed compliance | - |
| LangChain | When already in LangChain ecosystem | - |
| Guidance | Token-level control, research | - |
Instructor Example
[object Object], instructor
,[object Object], pydantic ,[object Object], BaseModel
,[object Object], ,[object Object],(,[object Object],):
name: ,[object Object],
age: ,[object Object],
,[object Object],
client = instructor.from_provider(,[object Object],)
user = client.chat.completions.create(
response_model=User,
max_retries=,[object Object],,
messages=[{,[object Object],: ,[object Object],, ,[object Object],: ,[object Object],}]
)Outlines Example
[object Object], outlines
,[object Object], pydantic ,[object Object], BaseModel
,[object Object], ,[object Object],(,[object Object],):
name: ,[object Object],
age: ,[object Object],
armor: ,[object Object],
model = outlines.models.transformers(,[object Object],)
generator = outlines.generate.json(model, Character)
character = generator(,[object Object],)
,[object Object],When Things Go Wrong
The Truncation Trap
The model hits max_tokens before completing JSON:
[object Object], ,[object Object],(,[object Object],):
response = client.chat.completions.create(...)
,[object Object],
,[object Object], response.choices[,[object Object],].finish_reason == ,[object Object],:
,[object Object], ValueError(,[object Object],)
,[object Object], response.choices[,[object Object],].message.contentSchema Compliance ≠ Content Accuracy
Structured outputs guarantee format, not truth. Always validate content:
[object Object], pydantic ,[object Object], field_validator
,[object Object], ,[object Object],(,[object Object],):
company_name: ,[object Object],
founded_year: ,[object Object],
,[object Object],
,[object Object], ,[object Object],(,[object Object],):
,[object Object], v < ,[object Object], ,[object Object], v > ,[object Object],:
,[object Object], ValueError(,[object Object],)
,[object Object], vGraceful Degradation Pattern
[object Object], ,[object Object],(,[object Object],) -> ,[object Object],:
,[object Object],
,[object Object],:
,[object Object], call_with_strict_schema(text).model_dump()
,[object Object], ValidationError:
,[object Object],
,[object Object],
,[object Object],:
,[object Object], call_with_relaxed_schema(text).model_dump()
,[object Object], ValidationError:
,[object Object],
,[object Object],
,[object Object],:
,[object Object], json.loads(call_with_json_mode(text))
,[object Object], json.JSONDecodeError:
,[object Object],
,[object Object],
,[object Object], {,[object Object],: text, ,[object Object],: ,[object Object],}Key Takeaways
For API-based applications:
- Use native structured outputs (OpenAI, Anthropic, or Gemini)
- Combine with Instructor for automatic retries
- Always check
finish_reasonfor truncation
For local model deployment:
- Use Outlines for guaranteed compliance
- Consider grammar constraints for code generation
For all cases:
- Design schemas simply with clear field descriptions
- Implement retry logic with exponential backoff
- Build graceful degradation paths
- Test edge cases before production
The tools exist. "The LLM returned invalid JSON" is no longer an acceptable production failure mode.
Building AI-powered systems and want help with structured outputs? Book a strategy call and let's discuss your architecture.


