Backstory

Not too long ago, a sudden urge hit me: I want a fast website. Not the normal kind of fast though, but the needlessly speedy kind of fast.

This post summarizes my journey in achieving that goal, detailing what I have learned and done to achieve the current speed of my website.

DNS

It’s always DNS!

In trying to speed up my site, I decided that I should look for things to improve in the same sequential order as how a browser would access a website—just to make sure that I do not miss any intermediary step that could have been a bottleneck.

Naturally, I started looking into my DNS setup, wanting to make sure that my DNS records are configured properly. As a good measure, I also run dig from a few different locations in the world to see if anything is out of order.

Everything seemed fine at first glance: all my records look good and dig runs well regardless of location. But then, I noticed something peculiar with my A record for nicholas.sh:

; <<>> DiG 9.18.0 <<>> nicholas.sh
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 14731
;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;nicholas.sh.           IN  A

;; ANSWER SECTION:
nicholas.sh.        495 IN  A   185.199.111.153
nicholas.sh.        495 IN  A   185.199.108.153
nicholas.sh.        495 IN  A   185.199.109.153
nicholas.sh.        495 IN  A   185.199.110.153

;; Query time: 29 msec
;; SERVER: 75.75.75.75#53(75.75.75.75) (UDP)
;; WHEN: Fri Apr 01 18:42:00 CDT 2022
;; MSG SIZE  rcvd: 104

I had originally put these IP addresses after following GitHub’s official documentation on using custom domain with GitHub Pages, which I use to host my website. Upon scrutinizing this record, I realized that all the IP listed are for servers in California.

This made me wonder if all the traffic to my website is being redirected to California. So, I took a deeper look into this.

CDN

I had previously heard that GitHub Pages uses Fastly as a CDN. Under that assumption, the idea that my website is being served only from California seems improbable. However, I wanted to make sure—there is always a chance that my memory of GitHub using Fastly CDN is wrong. So, I ran curl -I nicholas.sh:

HTTP/1.1 301 Moved Permanently
Server: GitHub.com
Content-Type: text/html
Location: https://nicholas.sh/
X-GitHub-Request-Id: 66DA:71F8:57E267:81AD4E:62478F1D
Fastly-Original-Body-Size: 162
Content-Length: 162
Accept-Ranges: bytes
Date: Fri, 01 Apr 2022 23:47:41 GMT
Via: 1.1 varnish
Age: 0
Connection: keep-alive
X-Served-By: cache-pwk4964-PWK
X-Cache: MISS
X-Cache-Hits: 0
X-Timer: S1648856861.131057,VS0,VE25
Vary: Accept-Encoding
X-Fastly-Request-ID: 6dcd98ab1570e75823d072ca297cffa4eb93aab3

Looking at this result, I can see that:

  • X-Fastly-Request-ID confirms my assumption of GitHub Pages using Fastly CDN.
  • X-Served-By: cache-pwk4964-PWK implies that my request was served by some server near my location in the University of Illinois.1

When repeating the curl command from devices located elsewhere, I also received a similar HTTP header that told me my request was served by a nearby server. Everything seems fine: all traffic to my website is being served by local servers that are close to the user.

However, I was still left with one question: how are packets with California IP addresses being redirected to these local servers? I ran traceroute from different locations to dig more into this. Below is a result from Amsterdam:

                                 Loss   Snt   Last   Avg  Best  Wrst StDev
1.|-- ???                       100.0     4    0.0   0.0   0.0   0.0   0.0
2.|-- 10.82.4.38                 0.0%     4    5.0   9.2   0.4  29.9  13.9
3.|-- 138.197.250.104            0.0%     4    0.9   1.0   0.9   1.4   0.2
4.|-- 138.197.250.76             0.0%     4    7.2   2.2   0.5   7.2   3.3
5.|-- 80.249.212.183             0.0%     4    0.8   0.9   0.7   1.2   0.2
6.|-- 185.199.111.153            0.0%     4    0.6   0.8   0.6   1.2   0.3

Somehow, packets are suddenly “teleporting” to 185.199.111.153 in California. As it turns out, 80.249.212.183 is actually an IXP (Internet Exchange Point) located in Amsterdam.

All this time, Fastly had been load balancing my traffic using Anycast. I was not familiar with BGP at the time, and had assumed that load balancing is only done by giving different A or AAAA records when doing DNS queries in different locations.

I ended up being unable to optimize the DNS lookup and load balancing part of accessing my website any further. However, I learned a lot about networking from this experience, and found some great resources along the way—such as how to build a CDN in 5 hours.

Getting rid of website bloat

After finally getting out of the rabbit hole that is BGP, I started auditing the content of my site, trying to look for possible improvements. Immediately, I discovered a lot of unnecessary bloats that I can get rid of.

Unused contents

Looking at my CSS files, I see quite a few no-op rules that do not affect any parts of my website—most of them being leftovers that used to style some parts of the website that no longer exist. I removed them by running PurgeCSS.2

I also purged some bloats in my HTML and JavaScript files. However, this was mostly done manually as the bloats were not unused contents that could be detected programmatically. Instead, it was mostly making decisions such as “is it worth adding X KB JavaScript to add Y feature?”

Image formatting

On a similar note to getting rid of relatively underused JavaScript, a lot of images that I was serving had been larger than they needed to be. I was able to save a pretty significant amount of space by converting all my images to the .webp format and appropriately downsizing them using Mogrify.

Interestingly, I saved the most space from properly formatting my favicon.

Minification

After making many tough decisions on whether certain contents are worth serving, the easy part came: deciding if no-op whitespaces are worth serving. Unsurprisingly, the answer is no. Same answer applies for JavaScript variable and function names that are unnecessarily long. I was able to get rid of all of these by improving the minification step to my build process.

I originally already had a step in my build process that deals with minification. However, I found out on closer inspection that some parts were not getting minified properly. For example, inline JavaScript contents were not processed properly. This was solved by simply refactoring them into a dedicated .js file, which also happens to help with caching (more on that in further sections).

Compression

To further reduce the size of files that are transferred over the network to the browser, I also looked into enabling content encoding such as gzip that can compress large packets. This can be enabled by sending the appropriate header in HTTP responses. Thankfully, this was already enabled by default for me on GitHub Pages. This is especially fortunate because I later found out that changing HTTP headers is not supported by GitHub Pages—meaning it would have been impossible for me to allow gzip encoding had it not been enabled by default.

Bundling

To capitalize on compression even further, I set a step in my build process to bundle as much of my CSS and JavaScript files together. If a pair of files are always served together on all pages, they will be bundled into one file. By merging multiple smaller files into one larger file, the file will contain more repetitive patterns. This allows gzip to get even more mileage from compressing the file.

Caching

After continuously debloating my website, I hit a point of diminishing return. I was only able to save ~1 KB each time I tried to optimize something. These ~1 KB improvements were even more negligible when taking into account other factors such as gzip compression; my website was only transferring ~150 B less over the network, even when I was able to reduce the size of certain files by ~1 KB.

Naturally, I headed towards a greener pasture: improving caching.

CSS and JavaScript refactor

Originally, I had a decent amount of inline CSS and JavaScript in my website. This was mostly due to certain misunderstandings. For instance, I had thought that putting inline JavaScript in the footer of HTML documents is the only way to use JavaScript without blocking DOM content from loading. As it turns out, using defer in <script> tag achieves the same effect.

After resolving a few misunderstandings such as the above, I refactored almost all of my CSS and JavaScript into separate files. Instead of repeatedly receiving the same inline CSS or JavaScript in each page that it cannot cache like before, the browser can now tell if a given page needs to reuse certain CSS or JavaScript files, and use the cached version it has.

Browser cache

Since a lot of the files in my website will rarely change, I wanted to let the browser cache them for longer by setting the appropriate HTTP header. Unfortunately, as previously mentioned, GitHub Pages does not allow its users to modify the header in HTTP responses. It also only allows the browser to cache content for 10 minutes at most.

Because of this, I started looking into other ways of caching that I can possibly use.

Turbo

My first attempt at improving the cache was simply to use the Turbo3 library which can maintain its own set of cache. Although some of its caching behavior can be difficult to handle appropriately, the library works well for the most part. However, I ended up not using the library due to its size. While Turbo is not ridiculously large, it would still easily be the largest file in my website if I were to include it. As such, I decided to find some other alternatives.

Service Worker cache

After doing some research, I managed to learn about Service Worker, which seemed promising. It would allow me to:

  • run another JavaScript thread that is separate from the main thread that usually runs on a webpage.
  • store files in the Service Worker cache for however long I like.
  • intercept network calls issued to fetch assets and serve it from the Service Worker cache locally.

Simple Service Worker

For my first try, I wrote a simple Service Worker that boiled down to the following logic:

  • if the browser is requesting an asset that is not in the cache, fetch it from the network, store it in the cache forever, and serve it locally from the cache.
  • if the browser is requesting an asset that is in the cache, serve it locally from the cache.

This works amazingly well. The service worker was able to serve any assets that it had previously fetched before in practically an instant. Not having to issue a network call for assets in the cache also further improved the speed of the website, in addition to allowing the website to load without internet connection. However, I still had to deal with the issue of updating contents that might have been stored in the cache by the Service Worker without any expiration time.

Background fetch Service Worker

My first attempt of doing so was to always issue a network request for assets in the background:

  • if the browser is requesting an asset that is not in the cache, fetch it from the network, store it in the cache forever, and serve it locally from the cache.
  • if the browser is requesting an asset that is in the cache, serve it locally from the cache. However, also start an asynchronous work in the background to fetch a fresher version of the asset and update the cache.

This version of the Service Worker works reasonably well. Contents are still being served as fast as before. However, when the website is updated, browsers are also guaranteed to eventually receive the update. It is probably reasonable to use this version of the Service Worker in most websites.

However, I was still unsatisfied with having to issue unnecessary network calls. Since my website is static, it is guaranteed that the content of a Service Worker cache will always be valid until I make a new push. So, I wrote a new Service Worker that is specifically optimized for a static page.

Final Service Worker

An important feature of Service Worker is that it will never be cached by the browser. Whenever JavaScript is run to install a new Service Worker, the browser will check if the content of the Service Worker file is still the same. If the content is still the same, the registration will become a no-op. Otherwise, it will install the new Service Worker.

I took advantage of the behaviour described above by adding another step in my build process. Whenever a build of my website is triggered, the value of a constant named VERSION in my Service Worker code will be set to the current time. This allows the browser to immediately detect when a new version of my website has been deployed.

Below is the new behavior of my Service Worker:

  • if the browser is requesting an asset that is not in the cache, fetch it from the network, store it in the cache forever, and serve it locally from the cache.
  • if the browser is requesting an asset that is in the cache, serve it locally from the cache.
  • on installation of a new Service Worker, invalidate the content of all existing caches.

This new Service Worker gives two nice guarantees:

  • as long as the website has not been updated, it will only ever issue one successful network call to fetch a given asset.
  • whenever the website is updated, the change will be detected immediately on the first navigation within the website, and stale caches will be flushed.

It is worth noting that any fetch from the network done by this version of the Service Worker should be set to ignore browser caches. If not, it is possible that even after invalidating the old Service Worker cache on installation of a new worker, the new Service Worker cache will be populated by the stale contents stored in the browser cache. Since no new network calls will be issued for contents that have been fetched, the new Service Worker is then stuck with outdated content until an even newer version of the Service Worker is deployed.

At the time of writing, I have found no other ways to improve this caching behavior.

Prefetching

After implementing what I thought of as the optimal caching behavior for a static website, I added my final optimization: prefetching. Similar to the Service Worker, I wrote my own custom JavaScript to implement prefetching. The behavior of the prefetching script is described below:

  1. register an event handler so the prefetching will happen only when the browser is idle.
  2. when the browser is idle, look at all the links that exist in the current viewport.
  3. if a given link in a viewport has not been prefetched and does not lead to an external website, prefetch it.

While the behavior is relatively simple, the speedup from this optimization has been massive. The combination of prefetching and Service Worker makes it so most navigations on the site will actually involve zero network calls, allowing instantaneous load.

Putting it all together

After trying out a lot of different optimization techniques, I was able to achieve the most speedup from implementing my own custom prefetching script and Service Worker. When combined, both of them allow most pages to load instantly. Moreover, adding both of them to new websites is relatively simple and low-cost. All it takes is adding two JavaScript files with a total size of ~1 KB and adding one <script> tag in the <head> tag of pages.

That said, prefetching and Service Worker is by no means a silver bullet. This solution will not work, for example, for people who browses without having JavaScript enabled. Additionally, different websites have different bottlenecks. As such, trying out many different techniques is probably a good idea, especially when diminishing return is taken into account.


  1. PWK is the airport code for Chicago Executive Airport, previously named Palwaukee Municipal Airport. ↩︎

  2. Unlike how most people use PurgeCSS, I did not integrate it as part of my build process. Since I rarely add new CSS rules, I chose to simply run PurgeCSS manually once in a blue moon in order to avoid adding npm dependency in my otherwise pure go build process. ↩︎

  3. Some people might be more familiar with an older version of Turbo called Turbolinks↩︎