Design an Inventory Management System with Real-Time Stock and Ordering
Problem
You are designing an inventory management service for a retail/marketplace platform (think grocery delivery) with multiple merchants and multiple physical locations — retail stores, dark stores, and warehouses. The same physical stock is sold through two channels simultaneously: online ordering (web/app) and in-store point-of-sale. Inventory counts must stay accurate across both channels, and changes in availability must propagate in near real time to shoppers browsing the app and to partner systems (merchant dashboards, third-party integrations).
Design a system that supports:
-
Stock tracking
— track on-hand, reserved, and available quantities for every (SKU, location) pair, across many merchants.
-
Ordering flows
— reserve / commit / release semantics so that an item placed in a checking-out cart is held, then either deducted on fulfillment or returned to inventory on cancel/timeout,
without ever overselling
.
-
Real-time updates
— push availability changes to clients and partner integrations as they happen.
Your design should cover key components and their responsibilities, the core data model, external and internal APIs (with idempotency and error handling), scalability for reads/writes including hot SKUs, and consistency/concurrency control. State your assumptions and justify trade-offs.
Constraints & Assumptions
State your own, but a reasonable baseline:
-
Tenancy & scope:
every inventory record is scoped by
(merchant_id, sku_id, location_id)
.
-
Scale:
~tens of thousands of merchants, millions of SKU×location rows; availability reads on the order of
104
–
105
QPS globally; stock-changing writes (sales, restocks, reservations) roughly
103
–
104
QPS.
-
Latency targets:
availability read p95 < 50 ms; reservation p95 < 150 ms per item.
-
Channels:
online orders and in-store POS both mutate the same stock; POS feeds and supplier receipts arrive as event streams that may be delayed, duplicated, or out of order.
-
Correctness priority:
never oversell physical stock; derived/read views may be eventually consistent and slightly stale.
Clarifying Questions to Ask
-
Is the displayed "available" count allowed to be slightly stale (eventually consistent), or must a shopper always see an exact real-time number?
-
Can an order be fulfilled across
multiple locations
, or is each cart reserved against a single fulfillment location?
-
What is the reservation TTL — how long do we hold stock for a cart in checkout before auto-releasing?
-
Do we need lot/batch/expiry tracking (e.g. perishables, FEFO allocation) or is a single fungible quantity per SKU×location sufficient?
-
Are POS and supplier feeds authoritative deltas (absolute counts vs. incremental adjustments), and what are their delivery guarantees (at-least-once? ordered?)?
-
What partner/integration SLAs exist for pushing availability changes, and how many concurrent real-time subscribers must we support?
What a Strong Answer Covers
The interviewer is evaluating breadth and depth across these dimensions — not a single "right" architecture:
-
The core invariant and state model
— a clear definition of on_hand / reserved / available / safety_stock and how every flow preserves
available≥0
.
-
Concurrency control that provably prevents oversell
— conditional atomic updates, optimistic versioning or serializable transactions, and a reasoned argument about the last-unit race.
-
Reservation lifecycle
— reserve → commit → release/expire, with TTLs, a sweeper/expiry mechanism, and idempotency on every mutating call.
-
Read/write split & data model
— authoritative OLTP store vs. read-optimized materialized views/caches, an append-only ledger for audit and rebuild, and the keys/relationships between them.
-
APIs
— external availability + reservation endpoints and internal ingest/event contracts, with idempotency keys, conditional (version) requests, and precise insufficiency errors.
-
Scalability
— sharding/partitioning strategy that keeps a key's state co-located, cache + search-index design, and an explicit hot-SKU mitigation.
-
Consistency boundaries
— where strong consistency is required vs. where eventual consistency is acceptable, and how versioning makes stale/out-of-order propagation safe.
-
Reliability & observability
— failure handling (at-least-once events + idempotent consumers, anti-entropy rebuild), plus metrics/alerts for oversell, reservation backlog, and event lag.
Follow-up Questions
-
A SKU goes viral and one (SKU, location) row receives thousands of concurrent reservation attempts per second. Walk through your hot-key mitigation end to end, and quantify the accuracy you trade away.
-
A store loses connectivity and operates POS offline for an hour, then reconnects with a backlog of sales. How do you reconcile, and what do you show shoppers in the meantime?
-
Two reservation events and a restock event for the same key arrive at the search index out of order. How does the consumer end up at the correct final state?
-
The authoritative inventory snapshot is corrupted by a bad bulk import. How do you detect it and rebuild from the ledger, and what is the blast radius?
-
Extend the design to support substitutable SKUs (e.g. a 12-pack standing in for two 6-packs) so a reservation can fall back to an alternative. What changes in the data model and the reservation path?