Jan 16, 2025

My docker suite for scrapers

A while ago I went really ambitious with a scraping project, so I really required some good tooling around it to make it manageable to deploy scrapers and schedule scheduling runs . Enter scrapyd and scrapydweb.

These two things have been life savers. I have them running on my home lab, and can now easily deploy any scrapy project (my favorite scraping lib) and schedule runs.

I found two base docker images by a company in Mexico (zentekmx) that I forked to upgrade versions. Here they are:

docker-scrapyd (the daemon that manages and runs your spiders)

docker-scrapydweb (a web interface to manage the daemon, deployment and schedulling.

I cannot recommed this scrapy suite more. Check out how I orchestrate them in my home lab with docker compose on this repo.