How to Automate Alt Text With AI and Python

Photo of author
Written By Ed Roberts

A so-called SEO expert. 

Alt text is an important feature in web development that provides a description of images for visually impaired users.

Including alt text in images can improve accessibility and ensure that everyone can access the content on a website.

However, manually adding alt text to images can be a time-consuming task, especially when dealing with a large number of images.

Fortunately, combining AI and Python provides a solution to easily automate alt text generation for images.

For this tutorial, we’re going to be leveraging Microsoft’s GIT generative image-to-text transformer to take an image URL and then describe the content of that image.

Of course, processing a single image at a time is highly inefficient, so instead this script will take a CSV with a list of image URLs and run through all of them, leaving you with a spreadsheet containing the image and the newly generated alt text.

As always if you just want to access the final script without going through all the fluff then you can access the colab file.

Step 1: Install the necessary packages

The first thing we’ll need to do is install and import all the packages we’ll be using for the script. I’ll quickly walk you through each one and what they’re used for.

!pip install transformers
from transformers import AutoProcessor, AutoModelForCausalLM
import requests
import torch
from tqdm import tqdm
from PIL import Image
import pandas as pd

First off we will be installing transformers (not the robot in disguise kind) which is a library that provides access to pre-trained transformer models for natural language processing.

From this library, we will be importing AutoProcessor and AutoModelForCausalLM which we will be using to load the pre-trained transformer model we’re using for generating captions.

We then have requests, a Python package that as the name suggests, provides functions for making HTTP requests. We will be using this to download images from URLs.

Next up is torch aka the PyTorch package, which provides support for machine learning and neural networks.

While tqdm is a package that provides a progress bar so that we can monitor the progress of the loop that generates alt text for each image (super useful when working with a large number of image URLs).

Then for handling the images themselves we have PIL (Python Imaging Library) which as you might be able to guess is a package for working with images in Python.

Finally, we have good old pandas. If you’re not familiar with pandas (better known as pd), is a package for working with data in Python. We will be using it to read and write the CSV file containing the image URLs and generated alt text.

Now we’ve got them installed, let’s move on to the next step.

Step 2: Load the image URLs and pre-trained models

So for this next step, we’re going to load a CSV with image URLs to create our data frame. To do this in Google Colab, click on the file icon in the upper left corner and then hit the upload button.

Screenshot showing the files sidebar and upload icon in the upper left of Google Colab

We’re now going to turn that CSV into a data frame with pandas to allow us to easily work with image data in Python. You’ll want to copy and paste the file path in place of /content/images.csv

df = pd.read_csv("/content/images.csv")

We’re also going to create an empty list called alt-text that we will be using to store the alt text later on.

alt_text = []

Next, we’re going to load the pre-trained transformer model we’ll be using for generating captions using the AutoProcessor and AutoModelForCausalLM classes from the Transformers package.

The model I’ve opted for is called “git-large-coco” which is GIT (GenerativeImage2Text), large-sized and fine-tuned on COCO.

There are plenty of other GIT pre-trained models available, so feel free to test others to find which works best for you.

git_processor = AutoProcessor.from_pretrained("microsoft/git-large-coco")
git_model = AutoModelForCausalLM.from_pretrained("microsoft/git-large-coco")

The script also then checks if a CUDA-enabled GPU is available for running the model, and if so, sets the device to “cuda” for faster processing. Otherwise, the device is set to “cpu”.

device = "cuda" if torch.cuda.is_available() else "cpu"
git_model.to(device)

Step 3: Defining the caption generator function

For this next step, we’re going to create the function that processes a single image and generates the alt text.

Breaking it down the generate_caption() takes as input a pre-processed image, a processor, and a model, and returns a generated caption for the image. The function encodes the image using the processor and returns a caption using the model.

def generate_caption(processor, model, image):
    inputs = processor(images=image, return_tensors="pt").to(device)
    generated_ids = model.generate(pixel_values=inputs.pixel_values, max_length=50)
    generated_caption = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
   
    return generated_caption

Now of course generating alt text for a single image isn’t going to make your life easier, so let’s put the newly created function to work on multiple images.

Step 4: Create a loop to generate alt text for multiple images

Finally, we will create a loop that iterates over each image URL in the DataFrame, downloads the image using requests, generates alt text using the generate_caption function, and appends the resulting alt text to a list.

for i in tqdm(range(len(df))):
    url = str(df.iloc[i][0])
    try:
        image = Image.open(requests.get(url, stream=True).raw)
        caption = generate_caption(git_processor, git_model, image)
        alt_text.append(caption)
    except:
        alt_text.append("NaN")

I’ve included tqdm here, which adds a progress bar so you can easily track the progress of the script looping through the images.

UPDATE: I’ve incorporated a try/except block to handle unknown image errors, ensuring the script runs smoothly even when it comes across an unexpected image format.

Last of all the alt text list is added as a new column in the DataFrame, and the updated DataFrame is written to a new CSV file called alt-text.csv.

df['alt_text'] = alt_text

df.to_csv("alt-text.csv", index=False)

You can then download this CSV from the files sidebar.

Overall, this code provides a useful tool for generating alt text for images in a dataset, which can help make the images accessible to people with visual impairments.

If you’re interested in learning how to automate other SEO tasks using Python check out my guides to automating keyword research using the Reddit API and how to extract sitemap URLs using Python.