Ollama Python API: How to Pass Images to Vision Models

🖼️ Ollama Python API: How to Pass Images to Vision Models

Ollama doesn't just process text; it can also "see" images using Vision Models like gemma3 or llava. In this tutorial, we will learn how to pass images to an Ollama model using the Python API.

⚠️ Crucial Note on File Paths: When using the Ollama Command Line (CLI), you can just type a file path. However, when using the Python API, you cannot just pass the local file path (like C:/image.png). Because the Ollama API communicates via the web (HTTP), you must convert your image into a text format called a Base64 String before sending it.

⚙️ Step 1: Converting an Image to Base64

First, we need to read the image file from our computer and encode it into a Base64 string. Python has a built-in library for this.

import base64

# 1. Define the path to your image
img_path = 'image1.png'

# 2. Open the image in "read binary" ('rb') mode
with open(img_path, 'rb') as f:
    image_bytes = f.read()

# 3. Convert the binary data to a Base64 string
image_64 = base64.b64encode(image_bytes).decode('utf-8')

print("Image successfully converted!")

👁️‍🗨️ Step 2: Passing a Single Image to the Model

Now that we have our image as a text string (image_64), we can pass it to the generate function using the images parameter.

import ollama

# Call the model (make sure you have pulled a vision model like gemma3:4b)
response = ollama.generate(
    model="gemma3:4b", 
    prompt="Describe this image in a short paragraph.",
    images=[image_64]  # Notice this is a list!
)

print(response['response'])

📌 Expected Output

The image shows a vibrant sunset over a calm ocean. The sky is painted with bright hues of orange, pink, and purple. In the foreground, there is a silhouette of a palm tree leaning gently toward the water...

💡 How it works: We pass the Base64 string inside a Python list [image_64] to the images parameter. The model reads the image and then answers the prompt based on what it sees.


📚 Step 3: Passing Multiple Images at Once

What if you want the model to compare two or three different images? You can simply loop through a list of file paths, convert all of them to Base64, and pass the entire list to Ollama.

import base64
import ollama

# 1. List of image paths
images_path = ["image1.png", "image2.png", "image3.png"]
images_b64 = [] # This will hold our final base64 strings

# 2. Loop through each path and convert it
for path in images_path:
    with open(path, 'rb') as f:
        temp_bytes = f.read()
        images_b64.append(base64.b64encode(temp_bytes).decode('utf-8')) 

# 3. Pass the entire list of images to the model
response = ollama.generate(
    model="gemma3:4b", 
    prompt="Compare all these images in a short paragraph.",
    images=images_b64  # Passing the list of 3 images
)

print(response['response'])

📌 Expected Output

The first image depicts a sunny day at the beach, while the second image shows a bustling city street at night. The third image is a close-up of a cup of coffee. While the first two showcase wide, expansive environments with contrasting times of day, the third image is highly focused on a single object.

🚀 Conclusion

By converting images to Base64, you unlock the ability to build powerful Python applications that can analyze photos, read charts, or compare visuals locally using Ollama. Just remember: always handle your files in binary mode ('rb') before encoding!

Post a Comment

Do Leave Your Comments...

Previous Post Next Post

Contact Form