Jan 16, 2025
My docker suite for scrapers
A while ago I went really ambitious with a scraping project, so I really required some good tooling around it to make it manageable to deploy scrapers and schedule scheduling runs . Enter scrapyd and scrapydweb.
These two things have been life savers. I have them running on my home lab, and can now easily deploy any scrapy project (my favorite scraping lib) and schedule runs.
I found two base docker images by a company in Mexico (zentekmx) that I forked to upgrade versions. Here they are:
docker-scrapyd (the daemon that manages and runs your spiders)
docker-scrapydweb (a web interface to manage the daemon, deployment and schedulling.
I cannot recommed this scrapy suite more. Check out how I orchestrate them in my home lab with docker compose on this repo.