Design product catalog service
Company: Instacart
Role: Software Engineer
Category: System Design
Difficulty: hard
Interview Round: Onsite
##### Question
Design a product catalog service for a large e-commerce / grocery platform. The catalog must support a hierarchical category tree implemented with a self-referential (self-join) `categories` table, products that can belong to multiple categories, and product variants/SKUs with flexible attributes. Walk through the full design, covering each of the following:
1. **High-level APIs.** Define the read/write endpoints: product CRUD; category CRUD with hierarchy operations (`createCategory`, `moveCategory`, `listChildren`, `getBreadcrumbs`); variant/SKU CRUD; product-to-category assignment; browse products within a category (with the option to include descendants); attribute/facet search and filter; product detail fetch; and asynchronous bulk import with job tracking. Show example request/response payloads, pagination, and filtering.
2. **Relational schema (with the category self-join).** Specify tables, key fields, constraints, and indexes: a `categories` table that self-references via `parent_id` (plus `name`, `slug`/`path`, `sort_order`, `depth`); a `products` table; a `product_categories` many-to-many join; `product_variants` (size/color, price); and a flexible attribute model (JSONB or EAV) for filterable attributes. Discuss the indexes required for fast tree traversal and product listing.
3. **Subtree query strategies and trade-offs.** Explain how to efficiently query a full category subtree and breadcrumbs: adjacency list with recursive CTEs vs. materialized path (e.g., Postgres `ltree`) vs. closure tables (and why nested sets are usually avoided). Discuss the read/write trade-offs and when to pick each.
4. **Caching, consistency, and the search index.** Cover object/CDN/search caching with versioned keys and invalidation; the consistency model (transactional writes in the DB, eventual consistency for caches and the denormalized search index kept in sync via outbox/CDC); optimistic locking and idempotency.
5. **Scaling.** Show how the design scales to ~10M products at ~1k QPS reads (read-heavy): read replicas and multi-region reads, partitioning, denormalization of hot attributes, bulk-import write patterns, and pagination (keyset, not OFFSET).
6. **Data integrity, versioning, and rollout.** Cover cycle prevention, referential constraints, API versioning, and a zero-downtime (expand/contract) plan for schema changes.
Quick Answer: Instacart software-engineer onsite system design: design a product catalog service for a large e-commerce/grocery platform. Covers a self-referential category hierarchy, CRUD/browse/search APIs, the relational schema and indexes, subtree query strategies (adjacency CTE vs. materialized path vs. closure table), caching, consistency via outbox/CDC, and scaling to ~10M products at ~1k QPS.