Maxim Mamykin
← Back to timeline
Project 2025 Python · TypeScript · SaaS

Scraping five platforms, notifying hundreds, on a single VPS.

A real-time rental aggregator for the Dutch market, built around async scraping, Redis pub/sub, and a multi-channel notification pipeline. Here's how the architecture works and what I'd change.

System architecture diagram showing scraper, API, and notification pipeline

The Dutch rental market in 2025 is a speed game. Listings on Pararius or Kamernet expire within minutes, supply is fragmented across five-plus platforms, and international students arrive with no network and no priority. I built Rental Bird around it: scrape every major Dutch rental site, match new listings against user preferences, and push sub-minute notifications over Discord, Telegram, or email.

The stack, end to end

Four Docker containers sit behind a single docker-compose file: a React + TypeScript frontend served by Nginx, a FastAPI backend running on Uvicorn, an async scraper with Celery workers, and a Redis instance that ties the last two together. The database is SQLite via SQLAlchemy. One file, zero ops overhead.

Data flow diagram: scraper to Redis to Celery workers to notification channels

Architecture tradeoffs

The interesting decisions aren't the framework choices. They're the constraints I accepted to keep the system small enough for one person to run on a single VPS.

SQLite over Postgres. A single-file database means zero connection pooling, zero replication, and deployment is a bind mount. The cost is write contention: the scraper and the API share the same file, and under heavy load SQLite's single-writer lock becomes the bottleneck. For the current user count (a few hundred), it holds. The day it doesn't, I swap the SQLAlchemy connection string and everything else stays the same.

Redis as glue, not state. Redis handles two jobs: pub/sub for real-time notification fanout, and backend rate limiting via slowapi. I deliberately avoid using it as a cache layer or session store. If Redis restarts, the only thing lost is in-flight messages. The scraper re-publishes on the next cycle anyway. That simplifies recovery to "restart the container."

Celery for fan-out, not for the scraper. The scraper itself is pure asyncio: asyncio.gather() with per-domain semaphores. Celery only enters the picture after a listing is persisted, to match users and dispatch notifications across three channels. Keeping the scraper out of Celery avoids the overhead of serializing DataFrames through a broker and keeps the hot loop tight.

The architecture bet is that a single SQLite file, a single Redis instance, and one async loop can serve hundreds of users. So far, it does.

Where the difficulty actually lives

The scraping code itself is the easy part. Every platform yields with enough patience. The hard problems are operational. De-duplication across five sites with inconsistent data shapes: is this the same flat listed on Pararius and Huurzone, or two different units in the same building? I solve that with URL uniqueness as the primary key and title + price + location as a secondary fingerprint. Rate discipline against upstream APIs: rotating user agents, respecting per-domain concurrency limits, and backing off on 429s rather than burning IPs. And notification reliability: Discord rate-limits DMs at 50 per second, Telegram has its own caps, and SMTP connections can stall. Each channel gets its own Celery task with independent retry logic.

Dashboard showing live listing feed Discord notification with listing embed

What I'm taking from this

Building Rental Bird taught me that the gap between "script that works on my machine" and "service that works for paying users" is almost entirely operational: error recovery, rate limiting, state management, and the discipline to keep the moving parts few enough that one person can reason about the failure modes. The notification pipeline is uninteresting code; making it deliver every time, within a minute, against rate-limited upstreams on a €5/month VPS. That's the actual product.