PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/System Design/Meta

Design a price tracking system

Last updated: Jun 15, 2026

Quick Overview

Meta software-engineer system-design technical-screen question: design a price tracking system for e-commerce that crawls product prices respectfully, stores historical prices, detects changes, and alerts users on price drops. It tests web-crawl orchestration and politeness, multi-strategy parsing and currency normalization, Offer-vs-Product identity resolution, time-series storage, change detection and alerting, plus freshness SLOs, multi-region deployment, and cost controls.

  • hard
  • Meta
  • System Design
  • Software Engineer

Design a price tracking system

Company: Meta

Role: Software Engineer

Category: System Design

Difficulty: hard

Interview Round: Technical Screen

##### Question Design a price tracking system for e-commerce sites (similar to price-history tools such as CamelCamelCamel or Keepa). The system ingests product URLs, crawls prices respectfully over time, captures historical prices, visualizes trends, and notifies users on price drops or back-in-stock events. Walk through the high-level architecture and then go deep on the components the interviewer probes. 1. **Scope, scale, and SLOs.** State your assumptions: number of tracked URLs/offers (e.g. 10M–100M offers across thousands of domains), number of users, and freshness targets. Define freshness SLA tiers (e.g. hot/popular items refreshed within ~1 hour, the long tail within 24 hours) and an extraction-accuracy and false-alert budget. 2. **Crawl ingestion and orchestration.** Design the scheduler/frontier that decides what to fetch next, per-domain rate limiting and politeness (token/leaky bucket, robots crawl-delay), a fetcher pool (mostly static HTTP with a small budgeted headless-browser pool for JS-rendered pages), and how you scale fetchers with backpressure off a message bus. 3. **Parsing, extraction, and normalization.** Extract price, currency, availability, and shipping from pages using a multi-strategy parser (structured data / JSON-LD / Schema.org first, then site-specific CSS/XPath templates, then heuristic/ML fallback). Normalize currency (convert to a reference currency with daily FX, keep native + normalized), timezones (UTC), and tax/shipping fields. Validate prices (positive, in a sane range, ignore obvious personalization/geo walls). 4. **URL canonicalization and deduplication.** Canonicalize URLs (strip tracking params, sort query params, normalize host/casing, resolve redirects) and dedupe content via a hash of the salient DOM/price block so you skip redundant history writes. 5. **Product identity resolution.** Distinguish an Offer (site+seller+variant listing) from a Product (canonical entity). Cluster offers to products using strong signals (UPC/EAN/GTIN, MPN, brand+model), medium signals (normalized title tokens, spec overlap, image embeddings), and weak signals; use blocking + pairwise scoring with accept/review/reject thresholds and a human-in-the-loop for ambiguous cases. 6. **Storage schema and database choices.** Choose fit-for-purpose stores: OLTP for offer/product metadata and subscriptions; a time-series/wide-column store (plus a cold object-store tier) for price histories; a search index for product search/faceting; a cache for hot endpoints; object storage for raw HTML snapshots. Lay out the core schemas (product, offer, offer_state, price_history, user/watchlist/alert, fx_rate). 7. **Change detection.** Write a history record only on a meaningful change in [price, availability, currency, shipping] (plus a periodic heartbeat), suppress noise (rounding thresholds, require stability across consecutive crawls for A/B-testing sites), and compute derived metrics like rolling 30/90-day min/max. 8. **Trends and alerting.** Support absolute-threshold, relative-drop, historical-low, and back-in-stock rules. Apply debounce/hysteresis to avoid flapping, per-user/per-offer cooldowns, and fan out notifications (email/push/signed webhooks with retry + DLQ). 9. **Search, subscriptions, and APIs.** Expose product search, price-history, and watchlist/alert APIs (REST/GraphQL) with caching/ETag, rate limiting, and auth. 10. **Backfill and replay.** Seed via sitemaps/merchant feeds/affiliate APIs with per-domain depth caps; keep raw pages in object storage and Kafka with retention so you can re-parse history when the parser is updated (pin a parser_version) and re-consume from offsets idempotently. 11. **Failure recovery and reliability.** Idempotent fetch jobs (deterministic job_id), retries with exponential backoff + jitter, dead-letter queues for persistent failures (paywalls/CAPTCHAs), and canary/golden-page regression tests for parsers. 12. **Anti-bot, CAPTCHA, and legal/robots compliance.** Stay compliance-first: honor robots.txt and crawl-delay, identify your crawler with contact info, prefer official/affiliate APIs, back off on 403/429, and do NOT bypass auth/paywalls or solve CAPTCHAs without explicit permission. 13. **Multi-region deployment.** Place compute pools (and edge fetchers) close to merchants to cut latency and block risk, replicate/shard the control-plane frontier metadata, mirror Kafka and object storage cross-region, and handle data residency (e.g. EU user data in EU). 14. **Cost controls.** Use spot/preemptible fetchers that autoscale on queue depth, strictly budget headless rendering, conditional GETs (If-Modified-Since/ETag) and compression, write-on-change + downsampling/tiering of old history to Parquet, per-merchant cost dashboards, and kill switches for runaway domains. Throughout, discuss data-model trade-offs, the price-change math (e.g. percent drop = (new − old)/old), and the operational concerns (capacity planning, observability, freshness SLOs).

Quick Answer: Meta software-engineer system-design technical-screen question: design a price tracking system for e-commerce that crawls product prices respectfully, stores historical prices, detects changes, and alerts users on price drops. It tests web-crawl orchestration and politeness, multi-strategy parsing and currency normalization, Offer-vs-Product identity resolution, time-series storage, change detection and alerting, plus freshness SLOs, multi-region deployment, and cost controls.

Related Interview Questions

  • Design Top-K, Crawler, and Chess Systems - Meta (hard)
  • Design Search And Web Crawling Systems - Meta (medium)
  • Design an Instagram-Style Social Feed - Meta (medium)
  • Design an Online Game Leaderboard - Meta (hard)
  • Design an On-Demand Delivery Platform - Meta (medium)
Meta logo
Meta
Sep 6, 2025, 12:00 AM
Software Engineer
Technical Screen
System Design
7
0
Question

Design a price tracking system for e-commerce sites (similar to price-history tools such as CamelCamelCamel or Keepa). The system ingests product URLs, crawls prices respectfully over time, captures historical prices, visualizes trends, and notifies users on price drops or back-in-stock events. Walk through the high-level architecture and then go deep on the components the interviewer probes.

  1. Scope, scale, and SLOs. State your assumptions: number of tracked URLs/offers (e.g. 10M–100M offers across thousands of domains), number of users, and freshness targets. Define freshness SLA tiers (e.g. hot/popular items refreshed within ~1 hour, the long tail within 24 hours) and an extraction-accuracy and false-alert budget.
  2. Crawl ingestion and orchestration. Design the scheduler/frontier that decides what to fetch next, per-domain rate limiting and politeness (token/leaky bucket, robots crawl-delay), a fetcher pool (mostly static HTTP with a small budgeted headless-browser pool for JS-rendered pages), and how you scale fetchers with backpressure off a message bus.
  3. Parsing, extraction, and normalization. Extract price, currency, availability, and shipping from pages using a multi-strategy parser (structured data / JSON-LD / Schema.org first, then site-specific CSS/XPath templates, then heuristic/ML fallback). Normalize currency (convert to a reference currency with daily FX, keep native + normalized), timezones (UTC), and tax/shipping fields. Validate prices (positive, in a sane range, ignore obvious personalization/geo walls).
  4. URL canonicalization and deduplication. Canonicalize URLs (strip tracking params, sort query params, normalize host/casing, resolve redirects) and dedupe content via a hash of the salient DOM/price block so you skip redundant history writes.
  5. Product identity resolution. Distinguish an Offer (site+seller+variant listing) from a Product (canonical entity). Cluster offers to products using strong signals (UPC/EAN/GTIN, MPN, brand+model), medium signals (normalized title tokens, spec overlap, image embeddings), and weak signals; use blocking + pairwise scoring with accept/review/reject thresholds and a human-in-the-loop for ambiguous cases.
  6. Storage schema and database choices. Choose fit-for-purpose stores: OLTP for offer/product metadata and subscriptions; a time-series/wide-column store (plus a cold object-store tier) for price histories; a search index for product search/faceting; a cache for hot endpoints; object storage for raw HTML snapshots. Lay out the core schemas (product, offer, offer_state, price_history, user/watchlist/alert, fx_rate).
  7. Change detection. Write a history record only on a meaningful change in [price, availability, currency, shipping] (plus a periodic heartbeat), suppress noise (rounding thresholds, require stability across consecutive crawls for A/B-testing sites), and compute derived metrics like rolling 30/90-day min/max.
  8. Trends and alerting. Support absolute-threshold, relative-drop, historical-low, and back-in-stock rules. Apply debounce/hysteresis to avoid flapping, per-user/per-offer cooldowns, and fan out notifications (email/push/signed webhooks with retry + DLQ).
  9. Search, subscriptions, and APIs. Expose product search, price-history, and watchlist/alert APIs (REST/GraphQL) with caching/ETag, rate limiting, and auth.
  10. Backfill and replay. Seed via sitemaps/merchant feeds/affiliate APIs with per-domain depth caps; keep raw pages in object storage and Kafka with retention so you can re-parse history when the parser is updated (pin a parser_version) and re-consume from offsets idempotently.
  11. Failure recovery and reliability. Idempotent fetch jobs (deterministic job_id), retries with exponential backoff + jitter, dead-letter queues for persistent failures (paywalls/CAPTCHAs), and canary/golden-page regression tests for parsers.
  12. Anti-bot, CAPTCHA, and legal/robots compliance. Stay compliance-first: honor robots.txt and crawl-delay, identify your crawler with contact info, prefer official/affiliate APIs, back off on 403/429, and do NOT bypass auth/paywalls or solve CAPTCHAs without explicit permission.
  13. Multi-region deployment. Place compute pools (and edge fetchers) close to merchants to cut latency and block risk, replicate/shard the control-plane frontier metadata, mirror Kafka and object storage cross-region, and handle data residency (e.g. EU user data in EU).
  14. Cost controls. Use spot/preemptible fetchers that autoscale on queue depth, strictly budget headless rendering, conditional GETs (If-Modified-Since/ETag) and compression, write-on-change + downsampling/tiering of old history to Parquet, per-merchant cost dashboards, and kill switches for runaway domains.

Throughout, discuss data-model trade-offs, the price-change math (e.g. percent drop = (new − old)/old), and the operational concerns (capacity planning, observability, freshness SLOs).

Solution

Show

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More System Design•More Meta•More Software Engineer•Meta Software Engineer•Meta System Design•Software Engineer System Design
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.