System Design: Price Tracking Platform for E-commerce
Context
Design a system that tracks prices across many e-commerce sites, builds historical price series, visualizes trends, and notifies users on price drops. Assume the platform must handle millions of products and operate under legal and robots.txt constraints.
Requirements
-
High-level architecture covering:
-
Crawl/scrape ingestion: schedulers, politeness, anti-bot handling, retries.
-
Parsers and normalization: extracting price, currency, availability; standardizing units.
-
Deduplication and product identity resolution across sites.
-
Storage schema for price histories and current state.
-
Change detection logic for meaningful price deltas.
-
Alerting rules, user preferences, and notification fanout.
-
APIs for search, product pages, price history, and subscriptions.
-
Address non-functionals:
-
Scale (millions of products/offers), freshness SLAs, failure recovery, per-domain rate limits, legal/robots compliance, and cost efficiency.
-
Include technology choices for databases, queues, caching, and a backfill/replay strategy.
Assumptions
-
“Product” is a canonical item (e.g., iPhone 14 128GB). “Offer” is a site/seller-specific listing for a product.
-
Some sites expose structured data (JSON-LD/Schema.org), others require HTML parsing; a small fraction require JavaScript rendering.
-
Users can follow offers or products and configure alert thresholds.