The problem

Every now and then, I like to make a few website as a way to help a few student organizations that I’m a part of. One constant pain point that I’ve encountered while doing this is adding and updating images of organization members. The pictures that are sent over to me tend to:

  • have different aspect ratios (1000x500 and 400x800).
  • be in inconsistent formats (.jpg, .png, and .gif sometimes somehow).
  • have random filename (2D6F9FB8-BB1B-464A-B77C-38AC65D602DD.jpg).
  • be a phone screenshot, forcing me to crop the surrounding of the picture.
  • be unoptimized in terms of size (4 MB for something that’ll show up as 150x150).

Usually, organizations would try to enforce certain rules to solve this problem. For example, they might ask all their members to send me only photos with specific dimensions and formats.

Ultimately though, there are limits to enforcing rules like this:

  • some members have trouble editing their pictures, especially in a non-tech organization.
  • given a large enough number of people, it is virtually guaranteed that some percentage of people will still fail to properly follow the rules.
  • certain rules, such as requiring photos to have optimized size, are just not realistic to enforce.
  • having everyone in the organization edit their own photos is ultimately not a good use of their time.

The solution

To solve this problem, I started delving for ways to easily deal with images in large numbers. After making websites for a few different organizations, I found that this workflow works relatively well:

Retrieving the images (and naming them)

In most student organizations that I have been a part of, members would usually submit their photos using Google Forms. The data can then be accessed using Google Sheets, which looks something like:

Name Title Photo
Béla Bartók Composer link to photo stored in Google Drive
Ada Lovelace Mathematician link to photo stored in Google Drive

From here, I can usually just import the sheet as a .csv file and write some quick scripts, such as the Ruby script below, to use:

require "csv"

CSV.read("members.csv").each do |row|
  name, title, photo_url = row

  # The links in Google Sheets are links to Google Drive containing the images.
  # We need to substitute some part of the links such that they become links to
  # the actual picture.
  downloadable_url = photo_url.gsub("/open?", "/uc?export=download&")

  # Use curl to download all the photos.
  # All photos are named `firstname_lastname`
  `curl -L '#{downloadable_url}' > #{name.downcase.gsub(" ", "_")}`
end

Sidenote: If you are using frameworks such as Jekyll or Hugo, which allows you to generate pages by writing .yaml files, you can add some prints to the script to output the .yaml file content you need too.

Batch processing the images

Having retrieved the images, we can now start processing them. There are many tools available for processing images, but I usually use Mogrify.

I stick with using CLI-based tools for batch processing images as it’s easier to use in a script. While there are websites and GUI-based applications that do the same job, I found that they usually:

  • require more manual interactions.
  • are locked behind paywalls / have usage limits.
  • have a tendency to rename the output images.

Below are some examples of using mogrify:

Standardizing the format

To standardize the image format, simply run:

mogrify -format webp *

Standardizing the aspect ratio

If the photo you have are of different aspect ratios, you can crop it using:

mogrify -gravity Center -crop 500x750+0+0 *

Resizing images

If you want to change the size of the photos while keeping the same aspect ratio, you can use resize:

mogrify -gravity Center -resize 500x750 *

Optimizing space usage of images

If you are going to be displaying the images in small size, lowering the quality of the image can save you some spaces without noticeable differences:

mogrify -quality 80 *

Real-life usage

Once you get used to the flags, your mogrify usage might look closer to this:

mogrify -auto-orient \
    -strip \
    -geometry 600x800^ \
    -crop 600x800+0+0 \
    -gravity Center \
    -gaussian-blur 0.05 \
    -quality 85 \
    -format webp *

Now, this is quite a jump from the previous examples, so let’s unpack this:

  • -auto-orient fixes the orientation of the photo. For example, when you take a picture on your phone with your screen rotated, it would typically write the image directly to your storage with the orientation information added as a metadata. This allows the write to complete quicker since the phone does not have to rotate the image prior to writing it. However, this might also cause mogrify and other software to behave unexpectedly if not handled properly (e.g. pictures are cropped into 3:4 ratio when you want 4:3).
  • -strip removes all metadata from your pictures. This is useful for privacy reasons and removing pesky metadata such as the image orientation.
  • -geometry 600x800^, -crop 600x800+0+0, and -gravity Center are used together so all our pictures are cropped at the center with 600x800 size. Unfortunately, explaining why these flags combinations produce such an effect is a rabbit hole that probably deserves its own post, so we will skip further explanations for now.
  • -gaussian-blur 0.05 and -quality 85 are used together so we can lower the space requirement of our pictures while making the quality differences as subtle as possible. -format webp also helps with this due to it being a more optimized format compared to .jpg or .png.

Putting it all together

With all these components, we can put together a script to populate or update a website’s images in batches. For even less toil, we can even include the script in a CICD pipeline—although this would need more robust scripts than the hacky one shown above to handle edge cases. For example, curl might fail, images might be really non-standard, and calling binaries like curl or mogrify might allow arbitrary code execution to happen.

Admittedly, this post might have been focusing too much on a relatively niche use case. I hope it gives some useful insights into how you could batch process images for your own purposes though.

For example, maybe you want to optimize the space usage of all of your .jpg images in your website. In that case, you can try out running something like this command in the root of your website repository:

find . -name '*.jpg' -exec \
    mogrify -strip -format webp -gaussian-blur 0.05 -quality 85% {} +