A couple ways to monitor a particular URL for updates

First published Wednesday March 19, 2025

File this one under sysadmin tricks for journalists.

Tuesday the JFK files — a gaggle of PDFs (that's what a bunch of PDFs are called) — were supposed to land. Supposedly in the afternoon. And maybe they would land at https://www.archives.gov/research/jfk/jfkbulkdownload.

The goal: Get an alert when that URL updates.

The enemy here is refreshing a URL by hand until you see the change you wish to see on the page. And in this little article will show you how to leverage shell scripts to defeat that enemy and make your life easier. Note that this won't help you if you're on a Windows machine, I'm on a Macintosh, this procedure is aimed at that and maybe linux boxes.

There are a few ways to do this, and I'll go through two slightly different ways. In addition to assuming you’re on a Macintosh, it assumes you’re within earshot of the computer.

Method 1: Look for changes in the HTML document

This approach will alert you if any changes in an HTML document are seen. It's a little bit of a sledgehammer and can produce false positives, but this is the essence of it: Save a representation of the contents of the HTML document once, and then every ten seconds re-download the document and see if anything's changed.

How you do it

Download the web page with curl
Hash the output (i.e. algorithmically convert it) with md5, save that as your "base" copy.
Then do that again and again and compare the md5 hashes, and if any subsequent hash you take doesn't match with the base then you know something on the web page has changed.
When you ascertain something's changed, speak out an alert (in this case it’s 'oh boy oh boy oh boy').

The code you use to monitor a web page for changes

This code will do all the steps above, monitoring a web URL for changes and alerting you when something changes. This is good for situations where you're expecting a change in the near future and the alert only works if you're in the same room as your computer, though it could probably be adjusted for other situations.

To use this code, copy it and edit the URL so it pulls from the URL you want to monitor, then paste it into your terminal program and press return.

curl --silent https://www.archives.gov/research/jfk/jfkbulkdownload | md5 > base;

watch -n 10 'curl --silent https://www.archives.gov/research/jfk/jfkbulkdownload | md5 > new; if ! cmp base new > /dev/null; then say 'oh boy oh boy oh boy'; fi'

As they say, there is another way.

Method 2: Look for changes in the HTTP request headers

Every request on the web has two parts: The actual document, and the meta information (i.e. "headers") attached to that document.

For example, this is what the headers on the JFK files page look like:

HTTP/2 200
content-type: text/html; charset=utf-8
content-length: 34220
date: Wed, 19 Mar 2025 22:20:58 GMT
content-language: en
last-modified: Wed, 19 Mar 2025 19:48:23 GMT
x-content-type-options: nosniff
etag: W/"1742413703-0-gzip"
v-ttl: 77245
cache-control: public, max-age=60, s-maxage=86400
v-cache-ttl: 77245
strict-transport-security: max-age=31536000; includeSubDomains; preload
x-frame-options: SAMEORIGIN
accept-ranges: bytes
vary: Cookie,Accept-Encoding
x-cache: Miss from cloudfront
via: 1.1 801c4cdd177872a11b03f54a2b3b464e.cloudfront.net (CloudFront)
x-amz-cf-pop: SFO53-C1
x-amz-cf-id: 2d4HH0RiR3c8Y2nd553ocDfimjWP-bJexT_6uYtKHZRgitblsRVeHw==

Most of that is gobbledegook. But there is one piece we care about: That last-modified timestamp. Not every website will reliably use this, but if your target is one that does, here's how to monitor and alert for changes in the last-modified timestamp.

The how-to is pretty much the same as above, and the code is similar but not identical.

curl --silent --head https://www.archives.gov/research/jfk/jfkbulkdownload | grep modified | md5 > base;

watch -n 10 'curl --silent --head https://www.archives.gov/research/jfk/jfkbulkdownload | grep modified | md5 > new; if ! cmp base new > /dev/null; then say 'oh boy oh boy oh boy'; fi'

You can email me at joe.murphy@gmail.com, or find another way to contact me here.