Design a price tracking system

Q: Design a price tracking system

This is a System Design interview question from Meta for Software Engineer roles. View the full question and solution on PracHub.

Q: How do I approach System Design interview questions?

System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master system design interviews.

Question

System Design: Cross‑Site Product Price Tracking

Context

Design the backend for a price-history tool that tracks product prices across many e-commerce sites. The system must be compliant with robots.txt, scalable, cost-aware, and provide user-facing search and alerting.

Assume a high-level scale (tune as needed during the interview):

10–50 million product URLs across 500–2,000 merchant domains
Freshness target: 6–24 hours for popular products; 2–7 days for the long tail
Concurrent fetches: O(10k–50k) globally, respecting per-domain limits
Price changes per day: ~5–15% of offers

Requirements

Functional

Ingest product URLs from users and internal seed lists.
Schedule respectful crawls: honor robots.txt, per-domain rate limits, backoff, and time windows.
Extract and normalize prices and currencies; handle taxes/shipping signals when available.
Deduplicate products across sellers into a canonical product catalog.
Store price histories and page snapshots.
Compute trends (e.g., % change, moving averages) and user alerts (thresholds, drops).
Expose search over products/offers and support user subscriptions/watchlists.

Non-Functional

High availability and graceful degradation when merchants change layouts or block access.
Multi-region deployment for latency/resiliency.
Cost controls across compute, bandwidth, and storage.

Discussion Prompts

Data model choices and storage technologies for products, offers, and time series.
Crawl orchestration at scale (frontier, politeness, prioritization, parsing).
Anti-bot posture and CAPTCHA handling within legal/ethical boundaries.
Backfill and re-crawl strategies (freshness SLAs, change detection, sitemaps).
Multi-region architecture and data replication.
Cost control levers and trade-offs.

Deliverables

High-level architecture, key components, and APIs.
Data modeling rationale and schemas (conceptual is fine).
Scheduling, deduplication, trend/alert logic, and operational strategies.