The problem

Every now and then, I like to make a few website as a way to help a few student organizations that I’m a part of, such as:

One constant pain point that I’ve encountered while doing this is adding and updating images of organization members. The pictures that are sent over to me tend to:

  • have different aspect ratios (1000x500 and 400x800).
  • be in inconsistent formats (.jpg, .png, and .gif sometimes somehow).
  • have random filename (2D6F9FB8-BB1B-464A-B77C-38AC65D602DD.jpg).
  • be a phone screenshot, forcing me to crop the surrounding of the picture.
  • unoptimized in terms of size (4 MB for something that’ll show up as 150x150).

Usually, organizations would try to enforce certain rules to solve this problem. For example, a few student organizations used to ask all their members to send me only photos with specific dimensions and formats, in an attempt to make my life easier.

Ultimately though, there are limits to enforcing rules like this:

  • some members have trouble editing their pictures, especially in a non-tech organization.
  • given a large number of people, some will usually still fail to meet the guideline given.
  • certain things such as enforcing optimized size is just not realistic to enforce.
  • spending time fussing over this is usually not a good use of members' time.

Due to this, I usually told people not to worry about what they send me, as I can simply write simple scripts to batch process their images. That said, I still really appreciate people’s thoughtfulness in trying to make things easier for me.

The solution

After making websites for a few different organizations, I found that this workflow works relatively well:

Retrieving the images (and naming them)

In most student organizations that I have been a part of, members would usually submit their photos using Google Forms. The data, can then be accessed as Google Sheets, which looks something like:

Name Title Photo
Béla Bartók Composer link to photo stored in Google Drive
Ada Lovelace Mathematician link to photo stored in Google Drive

From here, I can usually just import the sheet as a .csv file and write some quick scripts, such as the Ruby script below, to use:

require "csv"

CSV.read("members.csv").each do |row|
  name, title, photo_url = row

  # The links in Google Sheets are a link to Google Drive containing the image.
  # We need to substitute some part of it such that it becomes a link to the
  # actual picture.
  downloadable_url = photo_url.gsub("/open?", "/uc?export=download&")

  # Use curl to download all the photos.
  # All photos are named `firstname_lastname.jpg`
  `curl -L '#{downloadable_url}' > #{name.downcase.gsub(" ", "_") + ".jpg"}`
end

Sidenote: If you are using frameworks such as Jekyll or Hugo, which allows you to generate webpages by writing .yaml files, you can add some prints to the script to output the .yaml file content you need too.

Batch processing the images

Having retrieved the images, we can now start processing them. There are many tools available for processing images, but I usually use Mogrify.

I stick with using CLI-based tools for batch processing images as it’s easier to use in a script. While there are websites and GUI-based applications that do the same job, I found that they usually:

  • require more manual interactions.
  • are locked behind paywalls / have usage limits.
  • have a tendency to rename the output images.

Below are some examples of using mogrify:

Standardizing the format

To standardize the image format, simply run:

mogrify -format jpg *

Standardizing the aspect ratio

If the photo you have are of different aspect ratios, you can crop it using:

mogrify -gravity Center -crop 500x750+0+0 *

Resizing images

If you want to change the size of the photos while keeping the same aspect ratio, you can use resize:

mogrify -gravity Center -resize 500x750 *

Optimizing space usage of images

If you are going to be displaying the images in small size, lowering the quality of the image can save you some spaces without noticeable differences:

mogrify -quality 80 *

Real-life usage

Once you get used to the flags, your mogrify usage might look closer to this:

mogrify -auto-orient \
    -strip \
    -geometry 600x800^ \
    -crop 600x800+0+0 \
    -gravity Center \
    -gaussian-blur 0.05 \
    -quality 85 \
    -format jpg *

Now, this is quite a jump from the previous examples, so let’s unpack this:

  • -auto-orient fixes the orientation of the photo. For example, when you take a picture on your phone with your screen rotated, it would typically write the image directly to your storage with the orientation information added as a metadata. This allows the write to complete quicker since the phone does not have to rotate the image prior to writing it. However, this might also cause mogrify and other softwares to behave unexpectedly if not handled properly (e.g. pictures are cropped into 3:4 ratio when you want 4:3).
  • -strip removes all metadata from your pictures. This is useful for privacy reasons and removing pesky metadata such as the image orientation.
  • -geometry 600x800^, -crop 600x800+0+0, and -gravity Center are used together so all our pictures are cropped at the center with 600x800 size. Unfortunately, explaining why these flags combination produce such an effect is a rabbit hole that probably deserve its own post, so we will skip further explanations for now.
  • -gaussian-blur 0.05 and -quality 85 is used to together so we can lower the quality and space requirement of our pictures while making the quality differences as subtle as possible. -format jpg also helps with this due to it being a lossy image format.

Putting it all together

With all these components, we can put together a script to populate or update a website’s images in batches. For even less toil, we can even include the script in a CICD pipeline—although this would need more robust scripts than the hacky one shown above to handle edge cases. For example, curl might fail, images might be really non-standard, and calling binaries like curl or mogrify might allow arbitrary code execution to happen.

Admittedly, this post might have been focusing too much on a relatively niche use case. I hope it gives some useful insights into how you could batch process images for your own purposes though.

For example, maybe you want to optimize the space usage of all of your .jpg images in your website. In that case, you can try out running something like this command in the root of your website repository:

find . -name '*.jpg' -exec \
    mogrify -strip -interlace Plane -gaussian-blur 0.05 -quality 85% {} +