Design a Perishable-Goods Inventory and Location Tracking System
Company: Amazon
Role: Software Engineer
Category: System Design
Difficulty: medium
Interview Round: Technical Screen
## Design a Perishable-Goods Inventory and Location Tracking System
Design a backend service that tracks **perishable items** across a retail network and reports where each item currently lives.
The physical setup is as follows. The company runs many **stores**. Each store stocks **perishable items** (items with an expiration date). When an item reaches (or nears) its expiration date, it is removed from the store shelf and moved into a designated **storage area** (a back-of-house or off-site holding location for expired/near-expired stock pending disposal, return, or markdown). There are many stores, a large catalog of perishable items, and many storage areas.
The system must answer queries such as:
- Given an **item ID**, where is this item right now — which store (or storage area) holds it, and what is its status (on-shelf vs. expired/moved)?
- For a given store, which perishable items does it currently hold, and which of its items have expired and into which storage area were they moved?
The interviewer is explicitly interested in the **high-level design** — the architecture, the API surface, the choice of data store, and how the system scales. You are **not** asked to write code or hand-design every table column; focus on components, data flow, the read and write paths, and trade-offs.
```hint Where to start
This is fundamentally a *location index* over a stream of state changes. Model the core write event as a "location/status change" for an item, and the core read as a point lookup by item ID. Get the read and write paths straight before debating SQL vs. NoSQL.
```
```hint The scary follow-up
The interviewer will push on "millions of items, scanning the whole table every few minutes to find what expired will crush the DB." Don't scan. Think about how to *avoid* the full scan: index/partition by expiration time, or process expirations as a queue of "items expiring in the next 5 minutes."
```
### Constraints & Assumptions
State your own numbers, but reasonable defaults for this scenario:
- Catalog on the order of **millions to low tens of millions** of perishable item records (active + recently moved).
- **Thousands of stores** and **hundreds of storage areas**.
- An item's location/status changes a bounded number of times in its lifecycle (received → on-shelf → expired/moved → disposed), so writes are **moderate**; reads (locate-by-ID, per-store listing) **dominate**.
- Locate-by-ID should be **low-latency** (single-digit to low tens of ms at the read path) and **strongly consistent enough** that a just-recorded move is visible to a follow-up lookup.
- Expiration sweeps run frequently (e.g. every few minutes) and must not require a full-table scan.
- The query API is read-heavy and must tolerate **up to millions of requests** (spiky), behind a load balancer.
### Clarifying Questions to Ask
- What exactly is an "item" — a unique physical unit (one per ID, like a serial/lot+unit) or a SKU with a quantity? This decides whether location is a single value or an aggregate over many units.
- What's the source of truth for a move — does a store POS/scanner emit an event when an item is shelved/expired/moved, or does the system itself decide expiration from a stored `expires_at`?
- What are the actual read SLAs and QPS, and what's the read/write ratio? Is locate-by-ID or per-store listing the hotter query?
- Do we need full **history** ("where has this item been over time") or only the **current** location? History changes the data model (event log + current-state view) vs. a single mutable row.
- How fresh must a query be — is it acceptable for a lookup to lag a move by a second or two, or must a move be immediately visible (read-your-writes)?
### Part 1 — Architecture, API, and data model
Sketch the end-to-end architecture: clients/web → API → data store, plus how location/status changes get recorded. Define the core read API (the `GET` lookups) and the underlying data model that supports them. The interviewer specifically asked how the `GET` endpoint is designed and accessed.
```hint API shape
Two read endpoints carry the product: locate-by-item (`GET /items/{itemId}/location`) and list-by-store (`GET /stores/{storeId}/items?status=expired`). The write is an event/state-change ingest (`POST` a move/expire event).
```
```hint Data model
You need (a) a *current-state* record keyed by `item_id` for O(1) locate-by-ID, and (b) a secondary access path by `store_id` and by `status`/`expires_at`. Decide whether current-state is a single mutable row or the head of an append-only event log projected into a current view.
```
#### What This Part Should Cover
```premium-lock What This Part Should Cover
```
### Part 2 — Data store choice and consistency
Choose the data store. The interviewer frames it as **relational (RDS / PostgreSQL / MySQL) vs. NoSQL (DynamoDB)** and asks *why*. When a candidate answers "consistency / ACID," the interviewer pushes back: **"Is consistency the only reason to pick relational?"** — i.e. justify the choice on more than one dimension (access patterns, secondary indexes, scaling model, operational cost), not just ACID.
```hint Justify on access patterns, not just ACID
The dominant queries are a key lookup by `item_id` and a range/filter by `store_id` and `expires_at`. Both a well-indexed relational table and a key-value store with a partition key + GSI can serve these — so argue from secondary-index ergonomics, horizontal-scale ceiling, and operational cost, not from "ACID" alone.
```
#### What This Part Should Cover
```premium-lock What This Part Should Cover
```
### Part 3 — Expiration sweeps and scaling
Address the interviewer's pointed scaling concern: with **millions of items**, naively scanning the entire table every few minutes to find what just expired will crush the database. Design the expiration pipeline so it never does a full scan, and describe how the read path scales to millions of requests.
```hint Don't scan — index time, or queue it
Two complementary moves: (1) make "what expires soon" an index/partition lookup by `expires_at` (a bounded range query, not a scan), and (2) push the work onto a time-ordered queue/scheduler so each item is *handled once* near its expiry instead of being re-discovered every sweep.
```
```hint Scaling the read path
Reads dominate, so add a cache in front of locate-by-ID (item location is read far more than it changes), put the API behind a load balancer with health checks, and scale horizontally on request volume.
```
#### What This Part Should Cover
```premium-lock What This Part Should Cover
```
### What a Strong Answer Covers
```premium-lock What a Strong Answer Covers
```
### Follow-up Questions
- How would you handle an item that is mis-scanned or whose move event arrives out of order or twice (idempotency / event ordering)?
- If product asks for full location *history* per item (audit trail), how does your data model change, and what's the storage/retention cost?
- A storage area fills up; how does the system route newly expired items, and how would you surface "storage area near capacity" alerts?
- How do you keep the locate-by-ID cache correct when a move happens — write-through, invalidate-on-write, or short TTL — and what's the staleness trade-off of each?
Quick Answer: This system design question evaluates the ability to architect a location-tracking and inventory service for perishable goods across a distributed retail network. It tests knowledge of data modeling, scalable read/write paths, and trade-offs between storage technologies for high-throughput state-change streams.