Uber Software Engineer Interview Prep Guide
Everything Uber actually asks Software Engineer candidates — concept walkthroughs, worked examples, and the real interview questions, drawn from candidate reports. Free to read.
Last updated

Technical Screen
Coding & Algorithms
-
Binary Search And Feasibility Optimization — covered in depth under Onsite below.
-
Graph Algorithms, Dependency Resolution And Connectivity — covered in depth under Take-home Project below.
-
Grid, Matrix And Spatial Algorithms — covered in depth under Take-home Project below.
-
Tree Traversal And Hierarchical Structures — covered in depth under Take-home Project below.
-
Arrays, Sliding Windows, DP And Stack Patterns — covered in depth under Take-home Project below.
System Design
-
Search, Autocomplete And Restaurant Discovery — covered in depth under Onsite below.
-
Real-Time Distributed Geospatial And Event Systems — covered in depth under Onsite below.
-
Cart Service And Pricing Engine Design — covered in depth under Onsite below.
Behavioral & Leadership
- Technical Leadership, Project Impact And Tradeoffs — covered in depth under Take-home Project below.
Onsite
Coding & Algorithms

What's being tested
This tests binary search correctness: loop invariants, boundary updates, duplicate handling, and safe midpoint calculation. It also tests binary-search-on-answer, where you minimize a feasible value k using a monotonic predicate, plus basic subset-sum dynamic programming for combinatorial feasibility.
Patterns & templates
-
Classic binary search —
`binary_search`(nums,target) inO(log n)time,O(1)space; usemid = lo + (hi - lo) // 2. -
First occurrence / lower bound — find smallest index with
nums[i] >= target; use half-open interval[lo, hi)to reduce off-by-one errors. -
Last occurrence / upper bound — find largest index with
nums[i] <= target; implement via`upper_bound`(target) - 1 and validate bounds afterward. -
Recursive binary search — same invariant as iterative version, but
O(log n)stack space; always define base case before computingmid. -
Binary search on answer — search
kover[1, max(workloads)]; predicatecan_finish(k)must be monotonic and usually costsO(n). -
Minimum feasible rate — if
hours(k) = sum(ceil(x / k)), then largerknever increases hours; total complexityO(n log M). -
Subset sum DP —
`can_sum`(nums,target) via boolean arraydp[target+1], update descending;O(n * target)time,O(target)space.
Common pitfalls
Pitfall: Updating
lo = midorhi = midwithout progress can infinite-loop; uselo = mid + 1when discardingmid.
Pitfall: Returning
midimmediately for duplicates fails first/last occurrence variants; keep searching after recording a candidate.
Pitfall: Treating subset sum as greedy is wrong; use DP unless constraints clearly allow meet-in-the-middle or bitset optimization.
Practice these
The practice cards below cover the canonical variants — solve all of them and time yourself.
Practice questions
-
Graph Algorithms, Dependency Resolution And Connectivity — covered in depth under Take-home Project below.
-
Grid, Matrix And Spatial Algorithms — covered in depth under Take-home Project below.
-
Tree Traversal And Hierarchical Structures — covered in depth under Take-home Project below.
-
Arrays, Sliding Windows, DP And Stack Patterns — covered in depth under Take-home Project below.
-
Parsing And Expression Evaluation — covered in depth under Take-home Project below.
![]()
What's being tested
This tests in-memory data structure design for maintaining top- items under frequent updates, deletes, ties, and churn. Interviewers expect clear tradeoffs between heap, balanced tree, bucketed linked list, and hash map approaches, with precise complexity and edge-case handling.
Patterns & templates
-
Hash map + min-heap —
O(log K)updates for approximate top- candidates; lazy-delete stale heap entries after frequency changes. -
Hash map + balanced tree — store
(freq, recency, key)inTreeSet/SortedDict; update by remove-then-reinsert inO(log n). -
Bucketed doubly linked list — map frequency to bucket node; move keys between adjacent buckets in amortized
O(1)for increment/decrement. -
AllOne-style structure —
key -> bucket, bucket hasfreqand key set/list; supportsinc,dec,getMaxKey,getMinKey. -
Top-K query strategy — scan buckets from max frequency downward until
Kkeys collected;O(K + number_of_buckets_visited). -
Tie-breaking by recency — maintain monotonic timestamp/counter; compare
(freq DESC, lastUpdated DESC, key)and update tie fields consistently. -
Concurrency template — start with one lock for correctness; discuss striped locks or actor/sharded ownership only after API semantics are clear.
Common pitfalls
Pitfall: Updating frequency in-place inside a heap or tree without removing the old ordering entry breaks ordering invariants.
Pitfall: Claiming
O(1)top- because increments areO(1); returning arbitrary max-frequency keys may still require traversal.
Pitfall: Ignoring delete/decrement semantics under zero counts causes memory leaks and stale keys during high-churn workloads.
Practice these
The practice cards below cover the canonical variants — solve all of them and time yourself.
Practice questions

What's being tested
Interval scheduling problems test whether you can model time ranges precisely, sort/merge bookings, and search for overlaps or gaps efficiently. Uber-style variants often add calendar-system constraints: cancellations, multiple participants, meeting rooms, time zones, and scalable aggregation with MapReduce/Spark.
Patterns & templates
-
Half-open intervals
[start, end)simplify overlap checks: two intervals conflict whena.start < b.end && b.start < a.end. -
Sort-and-scan merge runs in
O(n log n)time; sort bystart, merge adjacent/overlapping intervals, then inspect gaps. -
Two-pointer availability search finds earliest common slot across two sorted calendars in
O(n + m); advance the interval ending earlier. -
Min-heap room allocation uses
heapqby earliest end time;O(n log k)for assigning or finding an available room. -
Binary search over sorted bookings checks one room’s availability in
O(log n)plus neighbor overlap checks viabisect_left. -
Sweep line events convert intervals to
(time, +1/-1)deltas; sort events to compute concurrent meetings, free capacity, or aggregate load. -
Distributed aggregation maps intervals to time buckets or event deltas, reduces by key, and handles skew with salting or hierarchical combining.
Common pitfalls
-
Pitfall: Treating intervals as closed
[start, end]causes false conflicts when one meeting ends exactly as another starts. -
Pitfall: Ignoring cancellations or updates leads to stale availability; model bookings with status/version, not just append-only intervals.
-
Pitfall: Mixing local times across users breaks correctness; normalize to UTC internally and only localize at input/output boundaries.
Practice these
The practice cards below cover the canonical variants — solve all of them and time yourself.
Practice questions
System Design
What's being tested
Uber-style restaurant search tests whether you can design a low-latency retrieval system that combines text search, prefix autocomplete, geospatial filtering, freshness, ranking, and operational safety. For a Software Engineer, the interviewer is probing architecture judgment: what should be precomputed, what should be evaluated at query time, how to shard and cache, and how to keep results correct when restaurants open, close, move, or change menus. Uber cares because search is on the critical path of marketplace conversion: stale or slow results directly hurt order volume, courier efficiency, and user trust. A strong answer balances p99 latency, index freshness, ranking quality, and system simplicity without drifting into deep ML modeling or product strategy.
Core knowledge
-
Autocomplete is usually served from a trie, compressed trie, DAWG, or finite-state transducer rather than scanning store names. Store top- candidate IDs per prefix node for fast lookup: query time is roughly , while memory grows with total characters and stored suggestions.
-
Full-text search is built around an inverted index mapping tokens to posting lists of document IDs. Systems like
Lucene,Elasticsearch, andSolrhandle tokenization, stemming, typo tolerance, field boosts, and scoring. Classic relevance often starts with BM25, where term frequency, inverse document frequency, and field length normalization determine text relevance. -
Geospatial search should avoid computing distance to every restaurant. Use geohash,
S2, orH3cells to map latitude/longitude into spatial buckets, retrieve nearby cells covering a radius, then apply exact Haversine distance filtering: -
Candidate generation and ranking are separate stages. Candidate generation retrieves a few hundred or thousand plausible restaurants using text, geo, cuisine, and availability filters. Ranking then scores them using fast features such as text match, distance, ETA, rating, delivery fee, open status, and historical popularity.
-
Hard filters must be applied before ranking when correctness matters. A restaurant outside delivery range, closed, unavailable, or not serving the searched item should not appear just because it has a high relevance score. A common pattern is: retrieve broad candidates, apply eligibility filters, rank eligible candidates, then hydrate response details.
-
Index freshness is a first-class design concern. Static fields like restaurant name, cuisine, address, and menu descriptions can live in the search index. Volatile fields like open/closed status, inventory, courier-constrained availability, surge fees, and delivery ETA are often fetched from authoritative services at query time or stored in a fast side cache.
-
Sharding strategy depends on the access pattern. Text-heavy systems often shard by document ID or term; local marketplace search benefits from geo-aware sharding so most queries hit a small number of nearby partitions. Replication supports read throughput and availability, while scatter-gather fanout must be bounded to protect
p99. -
Caching is extremely valuable for autocomplete and popular local queries. Use
Redisor in-process caches for normalized prefixes like"sus"in a city cell, and cache result IDs separately from hydrated restaurant details. Short TTLs reduce stale closed-store results; event-driven invalidation is useful for high-impact changes. -
Query normalization improves recall: lowercase, trim whitespace, remove punctuation, handle accents, synonyms, common misspellings, and cuisine aliases such as
"bbq"versus"barbecue". For autocomplete, preserve prefix semantics; for full search, tokenization and synonym expansion can be more aggressive. -
Latency budgets should be explicit. For example: API gateway 5 ms, query parsing 5 ms, index lookup 30 ms, availability/ETA fanout 40 ms, ranking 10 ms, hydration 20 ms, leaving buffer for network variance. Designing for
p99 < 200 msrequires bounded fanout, timeouts, fallbacks, and partial degradation. -
Monitoring should cover both system health and result correctness. Track
p50,p95,p99, timeout rate, cache hit rate, index lag, zero-result rate, stale-result reports, shard hot spots, and dependency failures. Search-specific alerts catch failures that generic CPU or memory metrics miss. -
Failure handling should degrade gracefully. If personalization or ETA service times out, return text-and-geo-ranked restaurants with approximate delivery time rather than failing the request. If a shard is unavailable, return partial results with enough redundancy from replicas, but avoid silently showing ineligible restaurants.
Worked example
For Design Uber Eats-style search function, a strong candidate starts by clarifying scope: “Are we searching restaurants only, menu items too, or both? Do we need autocomplete? What are the expected QPS, latency target, geographic scale, and freshness requirements for open status and menus?” Then they declare assumptions, for example: 50M restaurants/items globally, local queries dominate, p99 under 200 ms, and open/closed status must be accurate within seconds.
The answer can be organized around four pillars: data model, indexing, query serving, and operability. The data model includes restaurant documents with name, cuisines, location, service area, menu item tokens, ratings, and stable IDs, while volatile attributes are kept out of the main index where possible. The indexing pillar uses a Lucene-style inverted index for text plus H3 or S2 cells for geo filtering, with replicas per region for low-latency reads.
The query-serving path is: normalize query, identify user location cell, retrieve candidates from nearby geo partitions and text postings, apply hard filters such as delivery radius and open status, rank candidates, hydrate restaurant cards, and return paginated results. A key tradeoff to flag is freshness versus latency: putting open status directly in the search index makes queries faster but risks stale results; checking an availability service at query time is more correct but adds fanout and tail latency. A good close is: “If I had more time, I’d discuss typo tolerance, multi-region failover, index rebuilds, and how to monitor zero-result and stale-result rates.”
A second angle
For Design a search autocomplete system, the same retrieval principles apply, but the latency and data-structure constraints are sharper. The user expects suggestions after each keystroke, often within 50 ms, so a trie/FST with precomputed top suggestions per prefix is more appropriate than a full search query for every character. Ranking can use locality and popularity, but it must be cheap: store per-city or per-cell top suggestions and rerank lightly using user location. Freshness matters less for static names, but closed or unavailable restaurants should be filtered before display or marked unavailable. The main design tension becomes memory versus relevance: storing top- suggestions at every prefix is fast but expensive, while computing top- dynamically saves memory but hurts latency.
Common pitfalls
Pitfall: Designing autocomplete as
SELECT * FROM stores WHERE name LIKE 'prefix%'.
That may work for thousands of stores, but it fails at Uber scale due to full scans, poor prefix ranking, and weak latency under concurrent keystrokes. A better answer mentions trie/FST structures, prefix caches, top- precomputation, and city- or geo-scoped indexes.
Pitfall: Treating search as only a text-ranking problem.
Restaurant discovery is also constrained by distance, delivery radius, open status, menu availability, and marketplace state. A high-scoring text match is useless if the restaurant cannot deliver to the user, so eligibility filters and freshness checks must be part of the serving path.
Pitfall: Jumping into components without defining the query flow.
Saying “use Elasticsearch, Redis, and Kafka” is not a design. Interviewers expect the request lifecycle: parse and normalize, retrieve candidates, apply geo and availability filters, rank, hydrate, cache, observe, and degrade under dependency failures.
Connections
Interviewers often pivot from this topic into geospatial indexing, distributed caching, search ranking, event-driven freshness, or rate limiting for high-QPS APIs. They may also ask for a smaller coding variant, such as implementing store autocomplete with a trie or sorted array plus binary search.
Further reading
-
Introduction to Information Retrieval — Manning, Raghavan, Schütze — foundational treatment of inverted indexes, scoring, and retrieval evaluation.
-
The Google S2 Geometry Library — practical reference for cell-based geospatial indexing on a sphere.
-
The Architecture of Open Source Applications: Elasticsearch — useful background on distributed search engine architecture and tradeoffs.
Practice questions
What's being tested
Uber system design interviews for Software Engineers probe whether you can build real-time distributed systems that preserve correctness under load, failures, and geographic skew. The interviewer is looking for practical judgment: when to use strong consistency versus eventual consistency, how to model moving entities, how to ingest and act on high-volume events, and how to debug `p99` latency or stale state. For Uber-like systems, small design mistakes can cause duplicate dispatches, unfair queue ordering, bad ETAs, or unavailable matching, so you must reason about APIs, data models, consistency boundaries, partitioning, and observability together.
Core knowledge
-
Geospatial indexing turns latitude/longitude into queryable cells. Common choices include
`S2`,`H3`, and`Geohash`; smaller cells improve precision but increase fanout. A typical nearby-driver query expands rings around a rider cell until enough candidates are found or a radius limit is reached. -
Location streaming is usually modeled as append-heavy events plus a latest-state view. Drivers emit updates every few seconds; the system writes raw events to
`Kafka`or similar, then updates a low-latency store such as`Redis`,`DynamoDB`, or`Cassandra`keyed by driver ID and geocell. -
Freshness windows matter more than perfect history for matching. A driver location older than, say, 10–30 seconds may be excluded or down-ranked. Store
last_seen_at, location accuracy, heading, speed, and availability state; never match solely on the last coordinate without checking staleness. -
Partitioning strategy determines scalability and hot-spot behavior. Partitioning by driver ID balances writes but makes geospatial reads expensive; partitioning by geocell enables local queries but creates hot partitions at airports or events. Hybrid designs maintain driver-primary records plus geocell membership indexes.
-
Consistency boundaries should be explicit. Matching a rider to a driver needs a compare-and-set or lease-like claim to prevent double assignment; location feeds and ETAs can be eventually consistent. A common pattern is “eventual discovery, strongly consistent commitment.”
-
Distributed queues need deterministic ordering rules and state transitions. For a pickup-area driver queue, define eligible states, enqueue time, priority adjustments, expiration, re-entry rules, and atomic dequeue. Use a durable ordered structure plus a reconciliation job for missed heartbeats or broken leases.
-
Consensus protocols such as Raft and Paxos solve replicated state-machine agreement under node failures. They require quorum agreement, commonly majority quorum , so a 5-node cluster tolerates 2 failures while preserving linearizability.
-
Raft is usually easier to explain operationally: leader election, log replication, commit index, terms, and membership changes. Paxos is more general but harder to implement correctly. In interviews, emphasize both safety under partitions and the availability tradeoff imposed by quorum loss.
-
Idempotency is required for event and API retries. Client-generated request IDs,
`Idempotency-Key`headers, or event IDs prevent duplicate ride requests, duplicate queue entries, and double payments. At-least-once delivery is common; handlers must tolerate replay. -
Backpressure and degradation keep real-time systems alive under spikes. If location ingestion overwhelms downstream matching, you can sample updates, coalesce by driver, drop stale intermediate positions, or prioritize active-trip drivers. Never let analytics-style event processing block dispatch-critical paths.
-
Observability should follow the user-facing workflow. Track
`p50`,`p95`, and`p99`latency for location update ingestion, candidate retrieval, match commit, queue enqueue/dequeue, and API calls. Add counters for stale drivers, duplicate-claim failures, queue skips, and geocell hot spots. -
API contracts should encode state, not just actions. For ride-hailing, examples include
`POST /rides`,`POST /drivers/{id}/location`,`POST /drivers/{id}/availability`, and`POST /matches/{id}/accept`. Include idempotency keys, version fields, timestamps, and clear error semantics.
Worked example
For Design a pickup-area driver queue, a strong candidate starts by clarifying the rules: is the queue per airport terminal or pickup zone, are drivers ordered strictly by arrival time, can priority tiers exist, what counts as leaving the area, and how quickly must the queue update? Then they declare assumptions: one queue per pickup-area cell, thousands of drivers per large airport, location updates every few seconds, and dispatch requires no duplicate assignment. The answer can be organized around four pillars: state model, event ingestion, queue operations, and failure handling.
For the state model, define driver states such as `OFFLINE`, `AVAILABLE_OUTSIDE_ZONE`, `QUEUED`, `OFFERED`, `ASSIGNED`, and `EXPIRED`, with legal transitions enforced server-side. For ingestion, location updates determine zone entry/exit, but enqueue should happen through a controlled transition so GPS jitter does not repeatedly add or remove a driver. For queue operations, use a durable ordered store keyed by `pickup_zone_id`, with ordering by eligible_since, tie-broken by driver_id or a monotonic sequence; use atomic claim semantics when dispatching. For failure handling, add leases for `OFFERED` drivers, idempotent enqueue requests, heartbeat expiration, and a reconciliation process that removes drivers whose latest location or availability is invalid.
One explicit tradeoff to flag is strict fairness versus availability. A single strongly consistent queue per airport gives clean ordering but can become a hot shard; sharding by terminal or lane improves throughput but may require a merge policy that slightly weakens global FIFO. A good close is: “If I had more time, I’d drill into the exact storage choice, simulate hot-airport load, and add dashboards for queue length, wait time, dequeue latency, and duplicate-claim attempts.”
A second angle
For Compare Paxos and Raft, the same distributed-systems instincts apply, but the framing shifts from designing a user-facing workflow to explaining correctness guarantees. Instead of geocells and driver state, focus on replicated logs, leader election, quorum writes, and what happens during network partitions. The key connection is that a pickup queue, match assignment service, or configuration store may internally rely on consensus to provide linearizable operations. A strong answer does not just say “Raft is simpler”; it explains that Raft decomposes consensus into leader election, log replication, and safety constraints, while Paxos separates proposer/acceptor/learner roles and is often harder to reason about in production implementations.
Common pitfalls
Pitfall: Treating location updates as the source of truth for every decision.
A tempting answer is to say “store all driver locations in `Redis` and query nearby drivers,” then stop there. That misses the hardest part: matching and queue dequeue require atomic state transitions, while location streams can be stale, duplicated, or out of order. A better answer separates raw events, latest-location cache, durable driver state, and strongly consistent assignment or queue commitment.
Pitfall: Hand-waving consistency with “eventual consistency is fine.”
Eventual consistency is fine for maps, ETAs, and candidate discovery, but not for double-assigning a driver or preserving airport queue order. Name the operation that must be linearizable or protected by compare-and-set, then explain where you intentionally accept stale reads to improve latency and availability.
Pitfall: Jumping into components before defining invariants.
Listing `Kafka`, `Redis`, `Cassandra`, and `Kubernetes` sounds system-design-like but can feel shallow. Start with invariants such as “one driver cannot be assigned to two rides,” “a queued driver cannot skip ahead without a rule,” and “stale locations are excluded,” then choose storage, APIs, and partitioning to enforce those invariants.
Connections
Interviewers may pivot from here into matching algorithms, load shedding, cache invalidation, leader election, idempotent API design, or observability-driven debugging. A URL shortener uses fewer real-time geospatial ideas, but it tests the same fundamentals: API contracts, key generation, caching, durability, hot-key handling, and availability tradeoffs.
Further reading
-
In Search of an Understandable Consensus Algorithm — Raft paper — best practical foundation for leader-based replicated logs and quorum safety.
-
Paxos Made Simple — Leslie Lamport — concise explanation of the core Paxos safety argument.
-
Google S2 Geometry — useful reference for cell-based geospatial indexing and hierarchical spatial decomposition.
Practice questions
What's being tested
These prompts test backend system design for a high-correctness, user-facing service: modeling carts, coordinating state across devices, preventing double checkout, and producing a price the user can trust. Uber cares because cart and pricing sit on the critical path of order conversion; stale menu data, race conditions, or inconsistent discounts directly cause failed checkouts, refunds, and customer support load. Interviewers are probing whether you can separate cart state, menu/catalog state, pricing computation, and checkout orchestration while being explicit about consistency, idempotency, and scalability tradeoffs. For class-design variants, they are also checking whether you can build extensible domain objects and composable pricing rules without hard-coding every promotion or fee.
Core knowledge
-
Domain model boundaries matter. A cart should store user-selected intent:
cart_id,user_id,merchant_id, item IDs, option/customization IDs, quantities, delivery mode, and version. It should not permanently trust mutable fields like menu price, availability, tax, or ETA without revalidation at pricing and checkout time. -
Single-merchant invariant is common for food delivery carts. Enforce it at the cart service layer: adding an item from a different
merchant_ideither rejects, prompts “start new cart,” or archives the existing active cart. This simplifies fulfillment, pricing, promotions, and checkout consistency. -
Cart lifecycle states should be explicit:
ACTIVE,PRICED,CHECKING_OUT,CONVERTED,ABANDONED,EXPIRED. A one-time checkout guarantee usually requires a state transition likeACTIVE -> CHECKING_OUT -> CONVERTED, guarded by a compare-and-set oncart_versionor a database transaction. -
Optimistic concurrency control fits multi-device carts well. Include
versionorupdated_atin every mutation request; applyUPDATE carts SET ... version = version + 1 WHERE cart_id = ? AND version = ?. If zero rows update, return409 Conflictand let the client refetch/merge. -
Idempotency keys are mandatory for checkout and useful for cart mutation retries. For
POST /checkout, accept anIdempotency-Keyscoped touser_idandcart_id; persist the request hash and final response. Retries after network timeouts should return the same order result, not create duplicate orders. -
Storage design should match access patterns.
Rediscan serve low-latency active cart reads with TTLs, whilePostgres,MySQL, or another transactional store remains the source of truth for checkout-critical state. Index by(user_id, state)for “load active cart” and bycart_idfor direct mutations. -
Consistency choices should be deliberate. Cart edits can tolerate read-your-writes within a region and occasional conflict resolution, but checkout requires strong correctness for inventory/availability, final price, payment authorization, and order creation. Use transactions or a saga with compensating actions where cross-service atomicity is impossible.
-
Pricing should be deterministic and explainable. A useful receipt model is:
Store the pricing input snapshot and itemized output so customer support and reconciliation can reproduce what the user saw. -
Pricing engine extensibility usually comes from composable rules. Define interfaces like
PricingRule.apply(PricingContext, PriceBreakdown)for promotions, membership benefits, delivery fees, small-order fees, taxes, and surge. Rules should be ordered and side-effect-free when possible, because discounts may depend on subtotal before or after fees. -
Menu and availability revalidation is a key edge case. Items can become unavailable, options can be removed, merchant hours can change, or prices can update between add-to-cart and checkout. Return structured errors like
ITEM_UNAVAILABLE,PRICE_CHANGED, orMERCHANT_CLOSED, and preserve user intent where possible. -
Scalability is less about cart size and more about request volume and fanout. A single cart usually has fewer than 100 line items, so simple
O(n)recomputation is fine. The hard part is minimizing synchronous calls to menu, promotion, tax, delivery-fee, and membership services on hot paths while keeping prices fresh. -
Caching must respect correctness boundaries. Cache menu item snapshots, promotion eligibility, and computed quotes for short TTLs such as 30–120 seconds, but invalidate or recompute on checkout. Never rely on a cached quote as final unless it is versioned, unexpired, and all dependent inputs still match.
Worked example
For Design cart management lifecycle service, a strong candidate starts by clarifying: is this for Uber Eats-style carts, can a user have multiple active carts, do carts span merchants, and what correctness guarantee is required at checkout? A good initial assumption is: one active cart per user per marketplace, one merchant per cart, multiple devices per user, and checkout must be exactly-once from the user’s perspective. Organize the answer around four pillars: API design (GET /cart, POST /items, PATCH /items/{id}, DELETE /items/{id}, POST /checkout), data model with cart, cart_item, cart_version, and state transitions, concurrency control using optimistic version checks, and checkout orchestration using idempotency plus final validation.
When discussing storage, you can propose Postgres for durable state and Redis for active-cart cache, but emphasize that checkout reads from or verifies against the durable record. For multi-device behavior, describe what happens when device A updates quantity while device B deletes the same item: one write succeeds, the other gets 409 Conflict with the latest cart. For one-time checkout, show the guarded transition: UPDATE carts SET state='CHECKING_OUT' WHERE cart_id=? AND state='ACTIVE' AND version=?; only the winner proceeds. A concrete tradeoff to flag is pessimistic locking versus optimistic concurrency: locks simplify reasoning but hurt latency and availability under mobile retries, while versioning handles the common low-contention case cleanly. Close by saying that, with more time, you would cover region failover, audit history for customer support, and how stale menu or price changes are surfaced to users.
A second angle
For Design cart and pricing engine classes, the same concepts appear, but the focus shifts from distributed service boundaries to object-oriented domain modeling. Instead of starting with storage and APIs, start with classes like Cart, CartItem, Customization, Money, PricingContext, PriceBreakdown, ReceiptLine, and PricingRule. The interviewer wants to see that item customization affects both validity and price: extra cheese, size upgrades, modifiers, and substitution choices should be represented as structured option selections, not free-form strings. The pricing engine should support rules such as PromotionRule, MembershipDiscountRule, DeliveryFeeRule, SurgeFeeRule, and TaxRule without editing a giant if/else block every time a new fee is introduced. The same correctness concerns still apply: deterministic ordering, rounding rules, idempotent recomputation, and an itemized receipt that explains the final total.
Common pitfalls
Pitfall: Treating the cart as just a key-value blob.
A tempting answer is “store the cart JSON in Redis and update it on every add/remove.” That misses indexing, lifecycle, auditability, conflict handling, and checkout correctness. A stronger answer can still use a JSON column or cache, but it defines durable entities, state transitions, versioning, and source-of-truth semantics.
Pitfall: Over-indexing on microservices before correctness.
Some candidates immediately split cart, pricing, promotions, tax, menu, inventory, and checkout into separate services without explaining transaction boundaries. That sounds scalable but can create inconsistent user-visible behavior. Land better by saying which calls are synchronous on checkout, which can be cached during browsing, and how failures degrade: retry, reject checkout, or show a stale-price warning.
Pitfall: Hard-coding pricing logic.
For class design, a common wrong answer is a calculateTotal() method with nested conditionals for surge, coupons, Uber One, taxes, and delivery fees. That becomes untestable and brittle. Prefer composable rules, explicit rule ordering, Money types to avoid floating-point errors, and golden tests for receipts.
Connections
Interviewers may pivot from here into distributed transactions, idempotent API design, cache invalidation, rate limiting, or order/payment workflow design. They may also ask for object-oriented follow-ups around strategy pattern, chain of responsibility, and how to test pricing rules with deterministic fixtures.
Further reading
-
Designing Data-Intensive Applications — strong background on transactions, replication, consistency, and data modeling tradeoffs relevant to cart and checkout systems.
-
Stripe API Idempotent Requests — practical reference for idempotency-key behavior on retry-prone payment and checkout APIs.
-
Martin Fowler, Money — concise explanation of why monetary values deserve a dedicated type instead of floating-point arithmetic.
Practice questions
Behavioral & Leadership
- Technical Leadership, Project Impact And Tradeoffs — covered in depth under Take-home Project below.
Take-home Project
Coding & Algorithms

What's being tested
These problems test graph modeling: converting prerequisites, build dependencies, grids, or flights into nodes and edges with the right direction and constraints. Interviewers are probing whether you can choose between topological sort, DFS cycle detection, Union-Find, and bounded shortest path instead of forcing one graph template everywhere.
Patterns & templates
-
Topological sort with
indegree+queue—O(V + E)time; if processed count< V, a directed cycle blocks completion. -
DFS cycle detection using
visiting/visitedstates —O(V + E); back-edge tovisitingmeans a dependency cycle exists. -
Build order dependency direction matters — for prerequisite
a -> b, decrementindegree[b]only afterais emitted. -
Union-Find / DSU for connectivity —
find,union, path compression, union by rank; near-constant amortized time, . -
Grid-to-graph mapping — convert
(r, c)toid = r * cols + c; union only valid land neighbors to count islands. -
Cheapest flight with K stops — use Bellman-Ford-style relaxation for
K + 1edges, or priority queue with(cost, node, stops)state. -
State includes constraints — shortest path with stop limits is not plain
Dijkstra; reaching a city cheaper but with more stops may be unusable.
Common pitfalls
Pitfall: Treating undirected connectivity logic as valid for directed dependency graphs; cycles and reachability have different meanings by edge direction.
Pitfall: Mutating the same distance array during bounded flight relaxation; use a copy per layer to avoid using too many edges.
Pitfall: Returning a partial build order without explicitly checking whether all nodes were processed.
Practice these
The practice cards below cover the canonical variants — solve all of them and time yourself.
Practice questions

What's being tested
Grid and matrix algorithms here test whether you can model 2D state cleanly, choose the right traversal strategy, and reason about boundaries, mutation, and complexity. Expect divide-and-conquer, DFS/backtracking, dynamic programming, and graph reachability patterns rather than heavy system design.
Patterns & templates
-
QuadTree recursion with
build(r0, c0, size)— check uniform region, otherwise split into four quadrants;O(n^2 log n)naive,O(n^2)with prefix sums. -
2D prefix sums for constant-time area checks — compute
sum(r1,c1,r2,c2); uniform binary square if sum is0or area. -
Word search DFS/backtracking using
dfs(r, c, i)— mark visited, explore 4 directions, unmark on return; worst caseO(mn * 4^L). -
Fixed-direction grid scanning for simpler word search — test 8 directions with
(dr, dc)arrays; complexityO(mn * D * L). -
Largest square DP in binary matrices —
dp[i][j] = 1 + min(top, left, diag)when cell matches;O(mn)time,O(n)space possible. -
Graph eventual safety via DFS coloring —
0=unvisited,1=visiting,2=safe; cycles are unsafe, terminal-reaching DAG paths are safe. -
Boundary helpers like
in_bounds(r,c)and direction arrays reduce off-by-one bugs; clarify diagonal moves, cell reuse, and empty input early.
Common pitfalls
-
Pitfall: Reusing a cell in word search accidentally because visited state is not restored during backtracking.
-
Pitfall: Building QuadTrees by rescanning every subgrid without discussing prefix-sum optimization or worst-case complexity.
-
Pitfall: Treating “can reach a terminal node” as equivalent to “all paths eventually reach a terminal node” in graph safety problems.
Practice these
The practice cards below cover the canonical variants — solve all of them and time yourself.
Practice questions

What's being tested
Tree traversal over binary and N-ary hierarchies: choosing DFS vs BFS, preserving ordering, and attaching metadata like depth, column, or path position. Interviewers probe whether you can convert a structural requirement—vertical order, boundary, kth value, indentation—into a clean traversal invariant with correct edge-case handling.
Patterns & templates
-
DFS with depth state —
dfs(node, depth)gives indentation, boundary levels, or recursive formatting;O(n)time,O(h)call stack. -
BFS with coordinates — queue
(node, row, col)for vertical traversal; sort bycol, thenrow, then value if required. -
In-order traversal for BSTs —
left -> node -> rightyields sorted order; stop afterkvisits forO(h + k)time. -
Boundary traversal — split into left boundary, leaves, right boundary; avoid duplicating root or leaf nodes across phases.
-
N-ary ordered children — child index matters; first child often defines left view/boundary, last child defines right view/boundary.
-
Iterative stack template — use
(node, depth, state)when recursion depth may exceed limits; preserves explicit traversal control. -
Complexity discipline — most solutions should be
O(n)time; vertical traversal may addO(n log n)sorting unless grouping order is maintained.
Common pitfalls
Pitfall: Treating vertical traversal as simple BFS without tie-breaking; equal column and row cases must be ordered exactly as specified.
Pitfall: Duplicating nodes in boundary problems, especially when a boundary node is also a leaf.
Pitfall: Forgetting degenerate trees: empty tree, single node, linked-list-shaped tree, or N-ary nodes with zero children.
Practice these
The practice cards below cover the canonical variants — solve all of them and time yourself.
Practice questions

What's being tested
These problems test array invariants, monotonic stacks, sliding windows, and dynamic programming under tight time constraints. Interviewers are probing whether you can turn a brute-force scan over subarrays or future elements into an O(n) or near-O(n) solution with correct edge-case handling.
Patterns & templates
-
Prefix invariant for permutations — track
max_so_far; prefix0..iforms1..kiffmax_so_far == i + 1, assuming valid permutation input. -
Monotonic stack for next smaller/equal element — maintain increasing stack of indices; pop while
prices[top] >= prices[i];O(n)time,O(n)space. -
Sliding window with frequencies — expand right, update pair count by
freq[x]; contract left while condition still holds; count many subarrays at once. -
Identical-pair counting — adding value
xcreatesfreq[x]new pairs; removing it deletesfreq[x] - 1; avoid recomputing combinations. -
At least k subarrays — when window
[l..r]satisfies condition, all extensions to the right also satisfy it, contributingn - r. -
Dynamic programming for constrained jumps — define
dp[i]as paths to positioni; transition from allowed jump sizes; use modulo arithmetic consistently. -
Prime preprocessing — generate primes ending in
3using Sieve of Eratosthenes up to max jump/position; avoid primality checks inside nested loops.
Common pitfalls
Pitfall: Using
sum(freq[v] choose 2)after every pointer move turns an intendedO(n)sliding window intoO(n * distinct).
Pitfall: For final prices, using strictly smaller instead of smaller-or-equal changes correctness on duplicate prices.
Pitfall: In DP counting, forgetting
mod = 1_000_000_007during each addition can overflow in languages likeJavaorC++.
Practice these
The practice cards below cover the canonical variants — solve all of them and time yourself.
Practice questions

What's being tested
These problems test recursive descent parsing and expression evaluation under strict syntax rules. Interviewers are probing whether you can tokenize input, handle operator precedence, parentheses, unary operators, Lisp-style nested forms, lexical scoping, and invalid-input detection without turning the code into ad hoc string hacks.
Patterns & templates
-
Tokenizer first — implement
nextToken()or scan with indexi; distinguish numbers, identifiers, operators,(,), and whitespace inO(n)time. -
Recursive descent parser — use functions like
parseExpr(),parseTerm(),parseFactor()for arithmetic precedence; each function consumes exactly the tokens it owns. -
Stack-based arithmetic evaluator — for
+ - * /, maintain current sign/term stack; parentheses can recurse or push previous state. -
Lisp evaluator template — parse forms like
(let x 2 (add x 3))recursively; evaluate operator, then operands, while advancing a shared index. -
Scoped environments — model variable bindings with
Map<String, Deque<Integer>>or copy-on-writeMap; push on entry, pop on exit to preserve lexical scope. -
Validation as parsing invariant — reject unexpected EOF, extra trailing tokens, missing
), empty expressions, invalid identifiers, and operator arity mismatches immediately. -
Complexity target — aim for
O(n)time over input length andO(d + v)space, wheredis nesting depth andvactive bindings.
Common pitfalls
Pitfall: Treating parsing as repeated
substring()splitting often breaks on nested parentheses and can degrade toO(n^2).
Pitfall: Forgetting unary
-makes expressions like-3,1*-2, or(- x)fail despite valid syntax.
Pitfall: Using one global variable map for Lisp
letleaks inner bindings into outer scopes; use scoped push/pop or copied environments.
Practice these
The practice cards below cover the canonical variants — solve all of them and time yourself.
Practice questions
Behavioral & Leadership
What's being tested
Interviewers are probing technical leadership: whether you can take an ambiguous engineering problem, choose a practical design, ship it reliably, and explain the impact without hiding tradeoffs. For a Software Engineer at Uber, this matters because many systems sit on high-throughput, low-latency, failure-prone distributed infrastructure where local choices affect riders, drivers, merchants, and internal teams. Strong answers show ownership, system design judgment, debugging depth, cross-functional communication, and the ability to reason about measurable engineering outcomes such as p99 latency, availability, cost, deploy safety, and operational load. The interviewer is not just asking “what did you build?”; they are testing whether you understand why the design was correct for the constraints and what you would change with hindsight.
Core knowledge
-
Project framing should start with the business or user problem, then translate it into engineering constraints. A strong SWE answer names the traffic scale, latency target, availability requirement, data correctness needs, migration constraints, and blast radius before describing implementation details.
-
Impact metrics should be concrete and engineering-owned. Examples include reducing
p99API latency from850msto220ms, cutting error rate from1.2%to0.05%, lowering cloud spend by30%, improving deploy frequency, or reducing on-call pages per week. Avoid vague claims like “improved scalability.” -
Architecture narratives should cover request flow, data flow, storage, API boundaries, failure modes, and observability. For backend projects, mention components such as
HTTP/gRPCservices, caches likeRedis, queues such asKafka, durable stores likePostgresorCassandra, and how each choice served a constraint. -
Scalability reasoning should include rough capacity math. For example, if a service handles
10k QPSand each request performs 3 downstream calls, the downstream dependency sees up to30k QPSbefore retries. Use to justify caching, batching, backpressure, or rate limits. -
Reliability tradeoffs often involve consistency, latency, and availability. A synchronous write path gives simpler correctness but increases user-facing latency and coupling; an asynchronous queue improves resilience but introduces eventual consistency, replay handling, and duplicate processing. Name the chosen failure behavior explicitly.
-
Data modeling choices should match access patterns.
Postgresworks well for relational integrity, transactions, and moderate write volume;CassandraorDynamoDB-style stores fit high write throughput and predictable key-based reads;Redisis useful for ephemeral low-latency lookups but needs TTLs, invalidation strategy, and fallback behavior. -
API contract design should include versioning, idempotency, and backward compatibility. For state-changing operations, an idempotency key prevents duplicate effects during retries. For migrations, support old and new clients simultaneously, use additive schema changes first, and remove legacy fields only after adoption is verified.
-
Observability is part of the design, not an afterthought. Mention service-level indicators like
availability,p50/p95/p99latency, saturation, queue depth, cache hit rate, retry rate, and dependency error rate. Use logs for discrete events, metrics for trends, and traces for cross-service latency attribution. -
Deployment safety should include feature flags, canaries, staged rollout, rollback plans, and data backfill validation. For risky changes, ship dark reads, dual writes, or shadow traffic before switching production traffic. A strong story explains how you reduced blast radius during launch.
-
Debugging leadership means narrowing uncertainty systematically. Describe how you formed hypotheses, inspected dashboards, compared healthy versus unhealthy cohorts, reproduced locally or in staging, used distributed tracing, and validated the fix. Do not jump straight from “there was a bug” to “I fixed it.”
-
Disagreement handling should be evidence-driven. When engineers disagree on
Rediscache versus database optimization, synchronous versus asynchronous processing, or monolith versus service split, compare against agreed constraints: latency budget, operational complexity, team expertise, migration risk, and reversibility. -
Technical leadership without authority includes writing design docs, setting milestones, clarifying ownership, mentoring peers, surfacing risks early, and aligning stakeholders on decisions. For a SWE, avoid sounding like you managed performance through HR processes; focus on technical execution, feedback loops, and unblocking the team.
Worked example
For Describe end-to-end design of past project, a strong candidate would spend the first 30 seconds setting context: “I’ll use a project where we redesigned a high-traffic notification service; the goal was to reduce delivery latency while improving reliability during traffic spikes.” Clarify the system boundary: who called the service, expected QPS, latency target, durability requirement, and whether duplicates were acceptable. Then organize the answer into four pillars: problem and constraints, architecture before and after, key tradeoffs, and measured impact.
The skeleton might be: first, explain that the old synchronous path called multiple downstream providers inline, causing high p99 latency and cascading failures. Second, describe the new design: API service validates requests and writes a durable job to Kafka; workers consume jobs, batch provider calls, retry with exponential backoff, and persist delivery status in Postgres or a key-value store. Third, cover reliability: idempotency keys prevent duplicate sends, dead-letter queues capture poison messages, and dashboards track queue lag, provider error rate, and end-to-end delivery latency. Fourth, state the result: for example, p99 request latency dropped from 1.4s to 180ms, provider incidents no longer took down the write API, and on-call pages fell by half.
The explicit tradeoff to flag is eventual consistency: the client no longer gets immediate final delivery status, but the system becomes more resilient and faster for the user-facing request. A good answer also mentions what was not done, such as avoiding a full provider abstraction rewrite because it would delay the reliability fix. Close with hindsight: “If I had more time, I would add automated replay tooling for dead-letter messages and stricter load tests around provider throttling.”
A second angle
For Describe a Trade-off Design Change, the same leadership skill applies, but the interviewer wants the decision process more than the full architecture. Pick one meaningful pivot, such as moving from direct database reads to a Redis cache, splitting a service, replacing polling with event-driven processing, or denormalizing data for read latency. Frame the original constraint, the options considered, and the criteria used to choose: latency, correctness, cost, complexity, reversibility, and operational risk.
For example, if you introduced caching, do not just say “we added Redis to make it faster.” Explain the tradeoff: lower p99 latency and database load in exchange for cache invalidation complexity and possible stale reads. Then show safeguards: TTLs, write-through or cache-aside strategy, fallback to the database, cache hit-rate monitoring, and a feature flag to disable the cache if correctness issues appeared.
Common pitfalls
Pitfall: Giving a résumé summary instead of an engineering narrative.
A weak answer lists technologies: “I used Java, Kafka, Redis, and AWS.” A stronger answer explains why each component existed, what constraint it solved, what alternatives were considered, and how the design behaved under failure or scale.
Pitfall: Claiming impact without measurement.
Saying “the project improved performance a lot” sounds unverified. Use before-and-after numbers, even approximate ones: p99 latency, error rate, throughput, CPU utilization, cost, incident count, migration completion rate, or developer productivity metrics like build time.
Pitfall: Avoiding conflict or tradeoffs.
Many candidates present every decision as obvious. Interviewers prefer candidates who can say, “We chose asynchronous processing even though it introduced eventual consistency, because the synchronous path was causing cascading failures; we mitigated that with idempotency, retries, and status tracking.”
Connections
Interviewers may pivot from this topic into system design, especially API design, caching, queues, consistency, and failure handling. They may also dig into observability and debugging, asking how you diagnosed latency, dependency failures, or data inconsistencies. For senior candidates, expect follow-ups on technical influence, design review communication, mentoring, and aligning multiple engineers around a migration plan.
Further reading
-
Designing Data-Intensive Applications — Deep coverage of storage, replication, consistency, streams, and distributed-system tradeoffs.
-
Google SRE Book — Practical framework for reliability, error budgets, monitoring, incident response, and operational excellence.
-
The Twelve-Factor App — Concise principles for deployable, observable, maintainable services in production environments.
Practice questions