me

blog

Sep 17, 2024

Winning (?) The Spinoff's treasure hunt from the terminal

It just so happened that the other day, while on one of my routine procrastination sessions news-perusing breaks, I found out The Spinoff was doing a treasure hunt in honor of their 10th anniversary.

The challenge: find nine cards (easter eggs, but not really hidden) in stories published this week and be in it to win....

... the chance to enter a draw for free stuff??? SIGN ME UP.

But would it really be me if I didn’t decide to do it by scraping the website somehow? I had a whole lunch break coming up after all…

Because of ~reasons~ (mostly being nosy and knowing my way around Chrome's developer tools), I have some insight on the inner workings of the API backing The Spinoff. With a quick inspection of a story’s HTML and the JSON feed, I knew I could easily find in which stories the treasure hunt icons were “hiding”. 

And the concept turned out simple enough that I figured I could do it exclusively from the command line. That was a win in and of itself, and very likely the why I'm writing all this up. I'd normally, after having an idea like this, over-engineer it to the point of paralysis.

After some tinkering with commands, I ended up with little shell script that I ran daily to grab the right URLs. I‘d then go to the browser, plug them in, scroll down maniacally, and COLLECT MY HARD-EARNED CARDS.

Only two CLI tools were really needed, but I ended up using four different open-source pieces of software:

curl, which 99.9% of (unix-based) computers should have out of the box.

jq, which, if you don’t, you should know about.

I also threw in a couple calls to csvkit’s in2csv and xsv for helping with the final output on my terminal, but the real work is really done by curl, which downloads the data, and jq, the JSON parser.

Here's a peek at the secret sauce:

#!/bin/sh

today=$(date -u +%Y-%m-%d) # get today's date in the YYYY-MM-DD format
base_url="https://thespinoff.co.nz/api/posts"
offset=0
limit=$1 # the N number of stories I'll be asking the API to return
withcontent="true" # makes the API return the content of stories

curl -G $base_url \
  --data-urlencode "offset=$(echo $offset)" \
  --data-urlencode "limit=$(echo $limit)" \
  --data-urlencode "withcontent=$(echo $withcontent)" \
  --data-urlencode "before=$(echo $today)T23:59:59" | # a pipe sends the output to the next command
jq '
  .[] | # loop over each story in the data
  select( .content[] | # find the story content and loop over each item (like a paragraph)
  select((.type | contains("interactives")) and (.data.intType | contains("treasure") ))
) | 
   {id: .id, slug: .slug, date: .date, title: .title, category: ._category}' |
jq -s '
  .[] |
  {
    title: .title,
    url: ("thespinoff.co.nz/" + .category + "/" + ( .date | split("T")[0] | strptime("%Y-%m-%d") | strftime("%d-%m-%Y") ) + "/" + .slug )
}' | # reorganizes the filtered output and reconstruct stories' URLs
jq -s |
in2csv -f json | # transform JSON to CSV format
xsv select url | 
xsv frequency | # count the number of interactive elements per URL
xsv table

Did I win the draw? Definitely not, even though I'm likely one of the very few people that actually knows that 10 cards placed in Spinoff stories last week (scandal), and could tell you exactly where they were (I keep receipts). Here's the breakdown by date:

Date Story count Card count
2024-09-09 1 2
2024-09-10 3 4
2024-09-11 1 1
2024-09-12 1 1
2024-09-13 2 2

Was it worth the time? Probably could’ve eaten lunch at an appropriate time.

But do I feel like the smartest person on earth for unpacking a nested json structure with a long-ass jq query? Hell yeah. And that's a win on my book.

ps: Happy 10th to The Spinoff! Earlier this year, I was finally convinced to fork out a few dollars a month to support the great work they do. You should do it too.