🐍 Ollama in Python: The Generate Function

In this tutorial, we will learn how to use the Ollama Python Library to interact with Large Language Models directly from your Python scripts. We will focus specifically on the generate function.

⚠️ Important Note: The generate function cannot store or maintain the history of previous conversations. It only generates a response based on the single prompt provided.

⚙️ Installing the Ollama Library

Before writing any code, you need to install the official Python package. Open your bash terminal or command prompt and run:

pip install ollama

💬 Basic Generate Function

Let's import the library and ask a simple question.

import ollama

response = ollama.generate(model="llama3.2:latest", prompt="What is the capital of india")
print(response)

📌 Understanding the Output

When you run the code above, it does not just return the final text. It returns a dictionary-style object containing multiple details about the generation process:

{ 'model': 'llama3.2:latest', 'created_at': '2023-10-25T10:00:00Z', 'response': 'The capital of India is New Delhi.', 'done': True, 'context': [1, 2, 3...], 'total_duration': 4500000000 }

🎯 Extracting Specific Data

To get only the response text or other specific parameters, you can access them directly from the object:

# To get just the text response
print(response['response'])

# To get other metadata parameters
print(response['created_at'])
print(response['model'])

🧠 Using Thinking Models

If you want to use a model with advanced reasoning capabilities (like qwen3:8b or DeepSeek), you simply change the model name in your code.

response = ollama.generate(model="qwen3:8b", prompt="What is the capital of india")

print(response['response'])

Note: This takes a lot more time to generate. Because it is a "thinking" model, it first analyzes and thinks about the problem internally before returning the final answer to you.

🌊 Real-Time Responses (Streaming)

Waiting for a long response can be boring. To generate the response in real-time (printing word-by-word like ChatGPT), use the stream=True parameter.

response = ollama.generate(
    model="llama3.2:latest", 
    prompt="Tell me about culture of india", 
    stream=True
)

# We have to iterate the response to fetch the new element every time
for i in response:
    print(i['response'], end="")

This loop continuously fetches chunks of text as they are generated and prints them on the same line without breaking.

🚀 Conclusion

The generate function is a powerful and simple way to get one-off answers from local LLMs using Python. You can also pass different types of parameters in the request to customize the behavior. Visit the official Ollama documentation for more advanced details!

Trending

Free Image Hosting Using Python and imgbb

Exercise 8-6 Solution Python Crash Course Chapter 8 : Functions

Build an Automated Economic Times News Scraper in Python

How to Read QR Code from Image Using Python

Extract Text from Image using Python - python project

Using the Generate Function in Ollama