Forcing Structured JSON Output

One of the most powerful features of Large Language Models is their ability to read messy, unstructured text and turn it into clean, structured data. In this tutorial, we will learn how to force an Ollama model to output data in a strict JSON format using a schema.

This is incredibly useful if you are building an application or API and need the AI to return data that your code can easily read and process.

📝 Step 1: The Unstructured Prompt

First, let's import the library and define the text we want the AI to analyze. Imagine we scraped this paragraph from a resume or a LinkedIn profile.

import ollama

# The messy, unstructured text we want to extract data from
prompt = """
"John Doe is a 25 year old software developer. His email is john@gmail.com. He knows Python, JavaScript and FastAPI. He has 3 years of experience."
"""

🏗️ Step 2: Defining the Format Schema

Next, we need to tell the model exactly how we want the output formatted. We do this by creating a dictionary that acts as a blueprint (a JSON Schema).

# The format blueprint we want the LLM to follow
format_schema = {
    "type": "object",
    "properties": {
        "name": {"type": "string"},
        "age": {"type": "integer"},
        "email": {"type": "string"},
        "skills": {
            "type": "array",
            "items": {"type": "string"}
        },
        "experience": {"type": "integer"}
    },
    "required": ["name", "age", "email", "skills", "experience"]
}

💡 What is happening here? We are instructing the AI that it MUST return an object containing exactly these five keys, and we are even enforcing the data types (e.g., age must be a number, skills must be a list).

🚀 Step 3: Generating the Response

Finally, we pass both the prompt and our format_schema to the generate function.

# Pass the prompt and the format schema to the model
response = ollama.generate(
    model="llama3.2:latest",
    prompt=prompt,
    format=format_schema
)

# Print the final structured data
print(response['response'])

📌 Expected Output

{ "name": "John Doe", "age": 25, "email": "john@gmail.com", "skills": [ "Python", "JavaScript", "FastAPI" ], "experience": 3 }

Magic! The model completely ignored conversational filler and returned perfectly formatted JSON that you can immediately parse into a Python dictionary or save to a database.

⚠️ Important Note on Missing Data

Watch out for "Hallucinations": If the text you provide is missing a required field (for example, if the prompt didn't mention an email address), the model might invent "dummy data" (like fake@email.com) to satisfy the schema.

🛠️ The Fix: To make this more efficient and accurate, always add a strict instruction to your prompt like: "If a specific piece of data is not found in the text, output NULL for that field. Do not invent data."

Trending

Free Image Hosting Using Python and imgbb

Exercise 8-6 Solution Python Crash Course Chapter 8 : Functions

Build an Automated Economic Times News Scraper in Python

How to Read QR Code from Image Using Python

Extract Text from Image using Python - python project

How to Force Structured Output