Design fulfillment truck routing and inventory system
Company: Axon
Role: Software Engineer
Category: System Design
Difficulty: medium
Interview Round: Technical Screen
## Scenario
You are designing the software platform for a large e-commerce **fulfillment network** that plans **truck routes** and manages **container/inventory movement** within and across facilities.
A **container** here is a physical unit (tote or pallet) that holds inventory items and physically moves between **dock doors**, **staging areas**, and **storage locations**. The system spans multiple warehouses and must support:
- **Inbound trucks** delivering containers to a fulfillment center.
- **Outbound trucks** picking up containers for delivery to other facilities or last-mile hubs.
- **Operators** who scan containers at each physical transition point, generating a high-volume stream of events.
- **Dispatchers** who reserve dock-door time slots, plan/replan routes, and monitor container status and delays.
Design an end-to-end system that covers the following **functional requirements**:
1. **Shipments/loads** — create and manage inbound/outbound loads, each with a list of containers and planned arrival/departure windows.
2. **Dock-door scheduling** — assign trucks to dock doors and time slots; detect and resolve scheduling conflicts.
3. **Route planning** — given a set of stops (facilities/docks) with time windows, produce a feasible route/sequence, and support **re-planning** when conditions change (delays, cancellations, missing containers).
4. **Container state tracking** — track each container's location (facility + zone), lifecycle status (e.g. created, staged, loaded, in-transit, received), and contents summary, end-to-end.
5. **Inventory correctness** — prevent double-allocation of a container/inventory; reconcile when scans disagree or events arrive late or out of order.
6. **APIs/UI** — dispatchers view routes, the door schedule, and container statuses; operators submit scans.
Your design should produce: a high-level architecture (services, data stores, messaging), a core data model, key APIs/events, a consistency strategy (idempotency, ordering, deduplication), a routing/optimization approach (exact vs. heuristic and what you optimize for), and an observability/failure-handling plan.
```hint Decompose first
This is really two coupled subsystems: (A) **container/inventory tracking** driven by a high-volume scan event stream, and (B) **transport planning** (dock scheduling + vehicle routing). Tracking is write-heavy and event-shaped; scheduling is transactional and constraint-heavy; routing is an optimization problem. Pick boundaries that let each scale and fail independently, and state an MVP before going deep.
```
```hint The routing problem is a known shape
"Stops + time windows + capacity + a travel-time matrix, minimizing some cost" is a classic, well-studied routing problem class — and the general version is computationally hard. So decide deliberately: is a provably optimal route realistic at the scale the prompt gives you, or do you need to trade optimality for a feasible answer within a time budget? Also pin down what *one* thing you're primarily optimizing for before you pick a method.
```
```hint Scan events are the hard correctness problem
Scanners go offline, buffer, replay, and report wall-clock times that drift — so the same event can arrive more than once, and events for one container can arrive out of order. That's three distinct failure modes (duplicates, reordering, and concurrent writes to the same container). Ask yourself what *independent* mechanism handles each one, and what should happen to a scan that doesn't fit the container's current state — dropping it and forcing it through are both wrong.
```
```hint Separate plan from truth, and append-only for audit
The routing service computes a *plan*; it must not own shipment truth — persist versioned `RoutePlan`s so you can compare plan vs. actual. For auditability, never mutate or delete events: keep an **append-only event log** and materialize current state from it (event sourcing / CQRS read models). This is also what lets you replay buffered scans safely.
```
### Constraints & Assumptions
State your own where the prompt is silent; reasonable starting numbers:
- **Scale**: tens to low-hundreds of facilities; thousands of dock doors total; on the order of hundreds to low-thousands of scan events per second at peak across the network (bursty per facility).
- **Routing size**: a single load/route typically has a small number of stops (single digits to ~15); the network has many concurrent routes.
- **Latency**: scan-to-visible should be near real time (seconds, not minutes); door-reservation confirmation is synchronous to the dispatcher.
- **Reliability**: network partitions occur; scan devices buffer and replay; a facility should keep operating (scanning, last-known state) during a partition.
- **Availability**: door scheduling and scan ingest are operational-critical (target high availability, e.g. ~99.9%+); analytics/global views can be eventually consistent.
- **Multi-tenancy**: multiple warehouses with role-based access control (dispatcher, dock operator, viewer).
### Clarifying Questions to Ask
- Do we **own** truck/driver dispatch and the travel-time/traffic data, or do we integrate with an external TMS for tendering and ETAs?
- Is container **contents/SKU detail** owned here, or by an existing WMS/OMS — i.e. do we store full contents or just a summary and a reference?
- What is the source of truth for **inventory availability** and order allocation — does this system reserve inventory, or only report container movement?
- How accurate are scanner **timestamps**, and do scanners supply a unique event ID, or must we synthesize one?
- For routing, what is the **single primary objective** the business cares about (on-time delivery vs. cost/distance vs. truck count)? And what hard constraints exist (driver hours, container capacity)?
- What is the expected **scale today vs. 2–3 years out** (facilities, events/sec, stops per route)?
### What a Strong Answer Covers
These are the dimensions an interviewer weighs — not a checklist to recite. Depth and correct trade-offs matter more than breadth.
- **Scoping discipline**: a defensible MVP and explicit out-of-scope items before diving into any one subsystem.
- **Decomposition**: sensible ownership boundaries between the coupled subsystems, with a justified choice of messaging vs. synchronous calls (and which paths must stay synchronous).
- **Data modelling**: the core entities and their relationships, including how event history is represented and how container contents relate to an external system of record.
- **Consistency under bad inputs**: a coherent story for duplicates, ordering, concurrent writes, and what happens to events that don't fit current state — with the reasoning for each, not just the term.
- **Dock scheduling correctness**: how overlapping reservations per door are prevented, where the consistency boundary sits, and how conflicts are surfaced/resolved.
- **Routing judgement**: recognizing the problem class, making the optimality-vs-tractability call appropriate to the stated scale, committing to a primary objective, and a credible re-planning strategy when conditions change mid-route.
- **Scalability & availability**: how the high-volume path scales horizontally, how reads stay fast, and how a facility behaves during a partition.
- **Operability**: the metrics that would tell you the system is healthy, and concrete failure modes paired with mitigations.
- **Interview process**: a brief, structured requirements pass, then depth — not an over-long requirements monologue.
### Follow-up Questions
- A burst causes scan-ingest lag to spike to minutes. How do you detect it, keep the UI honest about staleness, and prevent the backlog from cascading into wrong container states?
- Two operators scan the same container into two different outbound loads within the same second. Walk through exactly how your design prevents the double-allocation and what the operators see.
- A truck is already en route when an upstream facility cancels half its load. How does re-planning decide what to freeze vs. re-optimize, and how is the new plan reconciled with what's physically already on the truck?
- How would you extend the design to add **inventory reservation/allocation** (not just movement tracking) without creating a distributed-transaction bottleneck between this system and the WMS/OMS?
Quick Answer: This question evaluates architectural design skills for scalable, event-driven fulfillment systems, including competencies in inventory/container state tracking, high-throughput event processing, transactional dock-door scheduling, and vehicle routing optimization; domain: system design.