🖼️ Ollama Python API: How to Pass Images to Vision Models
Ollama doesn't just process text; it can also "see" images using Vision Models like gemma3 or llava. In this tutorial, we will learn how to pass images to an Ollama model using the Python API.
C:/image.png). Because the Ollama API communicates via the web (HTTP), you must convert your image into a text format called a Base64 String before sending it.
⚙️ Step 1: Converting an Image to Base64
First, we need to read the image file from our computer and encode it into a Base64 string. Python has a built-in library for this.
import base64
# 1. Define the path to your image
img_path = 'image1.png'
# 2. Open the image in "read binary" ('rb') mode
with open(img_path, 'rb') as f:
image_bytes = f.read()
# 3. Convert the binary data to a Base64 string
image_64 = base64.b64encode(image_bytes).decode('utf-8')
print("Image successfully converted!")
👁️🗨️ Step 2: Passing a Single Image to the Model
Now that we have our image as a text string (image_64), we can pass it to the generate function using the images parameter.
import ollama
# Call the model (make sure you have pulled a vision model like gemma3:4b)
response = ollama.generate(
model="gemma3:4b",
prompt="Describe this image in a short paragraph.",
images=[image_64] # Notice this is a list!
)
print(response['response'])
📌 Expected Output
💡 How it works: We pass the Base64 string inside a Python list [image_64] to the images parameter. The model reads the image and then answers the prompt based on what it sees.
📚 Step 3: Passing Multiple Images at Once
What if you want the model to compare two or three different images? You can simply loop through a list of file paths, convert all of them to Base64, and pass the entire list to Ollama.
import base64
import ollama
# 1. List of image paths
images_path = ["image1.png", "image2.png", "image3.png"]
images_b64 = [] # This will hold our final base64 strings
# 2. Loop through each path and convert it
for path in images_path:
with open(path, 'rb') as f:
temp_bytes = f.read()
images_b64.append(base64.b64encode(temp_bytes).decode('utf-8'))
# 3. Pass the entire list of images to the model
response = ollama.generate(
model="gemma3:4b",
prompt="Compare all these images in a short paragraph.",
images=images_b64 # Passing the list of 3 images
)
print(response['response'])
📌 Expected Output
🚀 Conclusion
By converting images to Base64, you unlock the ability to build powerful Python applications that can analyze photos, read charts, or compare visuals locally using Ollama. Just remember: always handle your files in binary mode ('rb') before encoding!