The problem
Every now and then, I like to make a few website as a way to help a few student organizations that I’m a part of. One constant pain point that I’ve encountered while doing this is adding and updating images of organization members. The pictures that are sent over to me tend to:
- have different aspect ratios (1000x500 and 400x800).
- be in inconsistent formats (
.jpg
,.png
, and.gif
sometimes somehow). - have random filename (
2D6F9FB8-BB1B-464A-B77C-38AC65D602DD.jpg
). - be a phone screenshot, forcing me to crop the surrounding of the picture.
- be unoptimized in terms of size (4 MB for something that’ll show up as 150x150).
Usually, organizations would try to enforce certain rules to solve this problem. For example, they might ask all their members to send me only photos with specific dimensions and formats.
Ultimately though, there are limits to enforcing rules like this:
- some members have trouble editing their pictures, especially in a non-tech organization.
- given a large enough number of people, it is virtually guaranteed that some percentage of people will still fail to properly follow the rules.
- certain rules, such as requiring photos to have optimized size, are just not realistic to enforce.
- having everyone in the organization edit their own photos is ultimately not a good use of their time.
The solution
To solve this problem, I started delving for ways to easily deal with images in large numbers. After making websites for a few different organizations, I found that this workflow works relatively well:
Retrieving the images (and naming them)
In most student organizations that I have been a part of, members would usually submit their photos using Google Forms. The data can then be accessed using Google Sheets, which looks something like:
Name | Title | Photo |
---|---|---|
Béla Bartók | Composer | link to photo stored in Google Drive |
Ada Lovelace | Mathematician | link to photo stored in Google Drive |
… | … | … |
From here, I can usually just import the sheet as a .csv
file and write some
quick scripts, such as the Ruby script below, to use:
require "csv"
CSV.read("members.csv").each do |row|
name, title, photo_url = row
# The links in Google Sheets are links to Google Drive containing the images.
# We need to substitute some part of the links such that they become links to
# the actual picture.
downloadable_url = photo_url.gsub("/open?", "/uc?export=download&")
# Use curl to download all the photos.
# All photos are named `firstname_lastname`
`curl -L '#{downloadable_url}' > #{name.downcase.gsub(" ", "_")}`
end
Sidenote: If you are using frameworks such as Jekyll or Hugo, which allows you to generate pages by writing
.yaml
files, you can add some prints to the script to output the.yaml
file content you need too.
Batch processing the images
Having retrieved the images, we can now start processing them. There are many tools available for processing images, but I usually use Mogrify.
I stick with using CLI-based tools for batch processing images as it’s easier to use in a script. While there are websites and GUI-based applications that do the same job, I found that they usually:
- require more manual interactions.
- are locked behind paywalls / have usage limits.
- have a tendency to rename the output images.
Below are some examples of using mogrify
:
Standardizing the format
To standardize the image format, simply run:
mogrify -format webp *
Standardizing the aspect ratio
If the photo you have are of different aspect ratios, you can crop it using:
mogrify -gravity Center -crop 500x750+0+0 *
Resizing images
If you want to change the size of the photos while keeping the same aspect ratio, you can use resize:
mogrify -gravity Center -resize 500x750 *
Optimizing space usage of images
If you are going to be displaying the images in small size, lowering the quality of the image can save you some spaces without noticeable differences:
mogrify -quality 80 *
Real-life usage
Once you get used to the flags, your mogrify
usage might look closer to this:
mogrify -auto-orient \
-strip \
-geometry 600x800^ \
-crop 600x800+0+0 \
-gravity Center \
-gaussian-blur 0.05 \
-quality 85 \
-format webp *
Now, this is quite a jump from the previous examples, so let’s unpack this:
-auto-orient
fixes the orientation of the photo. For example, when you take a picture on your phone with your screen rotated, it would typically write the image directly to your storage with the orientation information added as a metadata. This allows the write to complete quicker since the phone does not have to rotate the image prior to writing it. However, this might also causemogrify
and other software to behave unexpectedly if not handled properly (e.g. pictures are cropped into3:4
ratio when you want4:3
).-strip
removes all metadata from your pictures. This is useful for privacy reasons and removing pesky metadata such as the image orientation.-geometry 600x800^
,-crop 600x800+0+0
, and-gravity Center
are used together so all our pictures are cropped at the center with 600x800 size. Unfortunately, explaining why these flags combinations produce such an effect is a rabbit hole that probably deserves its own post, so we will skip further explanations for now.-gaussian-blur 0.05
and-quality 85
are used together so we can lower the space requirement of our pictures while making the quality differences as subtle as possible.-format webp
also helps with this due to it being a more optimized format compared to.jpg
or.png
.
Putting it all together
With all these components, we can put together a script to populate or update
a website’s images in batches. For even less toil, we can even include the
script in a CICD pipeline—although this would need more robust scripts than
the hacky one shown above to handle edge cases. For example, curl
might fail,
images might be really non-standard, and calling binaries like curl
or
mogrify
might allow arbitrary code execution to happen.
Admittedly, this post might have been focusing too much on a relatively niche use case. I hope it gives some useful insights into how you could batch process images for your own purposes though.
For example, maybe you want to optimize the space usage of all of your .jpg
images in your website. In that case, you can try out running something like
this command in the root of your website repository:
find . -name '*.jpg' -exec \
mogrify -strip -format webp -gaussian-blur 0.05 -quality 85% {} +