🐍 Ollama in Python: The Generate Function
In this tutorial, we will learn how to use the Ollama Python Library to interact with Large Language Models directly from your Python scripts. We will focus specifically on the generate function.
generate function cannot store or maintain the history of previous conversations. It only generates a response based on the single prompt provided.
⚙️ Installing the Ollama Library
Before writing any code, you need to install the official Python package. Open your bash terminal or command prompt and run:
pip install ollama
💬 Basic Generate Function
Let's import the library and ask a simple question.
import ollama
response = ollama.generate(model="llama3.2:latest", prompt="What is the capital of india")
print(response)
📌 Understanding the Output
When you run the code above, it does not just return the final text. It returns a dictionary-style object containing multiple details about the generation process:
🎯 Extracting Specific Data
To get only the response text or other specific parameters, you can access them directly from the object:
# To get just the text response
print(response['response'])
# To get other metadata parameters
print(response['created_at'])
print(response['model'])
🧠 Using Thinking Models
If you want to use a model with advanced reasoning capabilities (like qwen3:8b or DeepSeek), you simply change the model name in your code.
response = ollama.generate(model="qwen3:8b", prompt="What is the capital of india")
print(response['response'])
Note: This takes a lot more time to generate. Because it is a "thinking" model, it first analyzes and thinks about the problem internally before returning the final answer to you.
🌊 Real-Time Responses (Streaming)
Waiting for a long response can be boring. To generate the response in real-time (printing word-by-word like ChatGPT), use the stream=True parameter.
response = ollama.generate(
model="llama3.2:latest",
prompt="Tell me about culture of india",
stream=True
)
# We have to iterate the response to fetch the new element every time
for i in response:
print(i['response'], end="")
This loop continuously fetches chunks of text as they are generated and prints them on the same line without breaking.
🚀 Conclusion
The generate function is a powerful and simple way to get one-off answers from local LLMs using Python. You can also pass different types of parameters in the request to customize the behavior. Visit the official Ollama documentation for more advanced details!