Remove Duplicate Images with Python Using Image Hashing





Remove Duplicate Images using Python 🐍

Do you have a folder full of images and want to delete the duplicates? Here's a simple Python script that scans all the images in a folder and keeps only the unique ones based on their visual hash. This is especially useful for meme folders, downloaded wallpapers, or social media collections.

Features:

  • Uses imagehash and Pillow to identify duplicates visually (not just by filename or size).
  • Skips corrupted or unreadable image files.
  • Prints duplicate image pairs for reference.
  • Automatically creates an output folder and saves only unique images there.

Python Code:


import os
import shutil
from PIL import Image, UnidentifiedImageError
import imagehash

# Input and output folder paths
input_folder = r'images_output'  # Images Input Folder
output_folder = r'images_output' # Images Output Folder (you can change this to a different path)

# Create the output folder if it doesn't exist
os.makedirs(output_folder, exist_ok=True)

# Dictionary to store image hashes and their paths
image_hashes = {}

# Iterate through files in the folder
for filename in os.listdir(input_folder):
    if filename.lower().endswith(('png', 'jpg', 'jpeg', 'gif', 'bmp', 'webp')):
        img_path = os.path.join(input_folder, filename)
        try:
            with Image.open(img_path) as img:
                img.load()  # Force load image to catch truncation errors
                # Calculate image hash
                img_hash = imagehash.average_hash(img)

                # Check if hash already exists
                if img_hash in image_hashes:
                    print(f"Duplicate found: {img_path} and {image_hashes[img_hash]}")
                else:
                    # Copy unique image to the output folder
                    shutil.copy(img_path, output_folder)
                    image_hashes[img_hash] = img_path
        except (OSError, UnidentifiedImageError) as e:
            print(f"Skipping file due to error: {img_path} — {e}")

print("✅ Unique images have been saved to the output folder.")

How to Use:

  1. Install the required libraries (if not already installed):
  2. pip install Pillow imagehash
  3. Put all your images inside a folder (e.g., images_output).
  4. Run the script in your Python environment (VS Code, IDLE, or any terminal).
  5. All unique images will be copied to the same folder (or change output path to keep them separate).

Notes:

  • It uses average_hash from imagehash, which is good for detecting visually similar images even with minor edits.
  • You can replace it with phash, whash, or dhash depending on your needs.
  • If you want to move (not copy) files, change shutil.copy to shutil.move.

💡 Pro Tip: This can be used to clean meme folders before uploading them to Instagram, YouTube Shorts, etc.


🔁 Like this tip? Let me know in the comments!

Previous Post Next Post

Contact Form