PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/Coding & Algorithms/DoorDash

Compute nearest courier for each customer

Last updated: Jun 15, 2026

Quick Overview

A DoorDash software engineer onsite coding question on nearest-neighbor search: for every customer, find the nearest courier by distance, breaking ties by smallest id. It tests brute force vs. spatial indexing (uniform grid, k-d tree, R-tree), time/space complexity, Euclidean vs. great-circle distance, and follow-ups on dynamic courier updates, batch queries, and a maximum pickup radius.

  • Medium
  • DoorDash
  • Coding & Algorithms
  • Software Engineer

Compute nearest courier for each customer

Company: DoorDash

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: Medium

Interview Round: Onsite

##### Question You are given two sets of 2D points: customers `C = {c1..cn}` and active couriers (dashers) `D = {d1..dm}`. Each customer `i` has coordinates `(xi, yi)`. Each courier `j` has coordinates `(aj, bj)` and an integer `id`. For every customer, return the `id` of the nearest courier by distance; on ties (equal distance), return the smallest `id`. Work through the following parts: 1. Implement a correct brute-force solution and state its time and space complexity. 2. Design and implement (or outline) an approach that beats the `O(n * m)` brute force when `n` and `m` are large (up to `1e5`). Discuss appropriate spatial data structures such as a k-d tree, R-tree, or uniform grid (spatial hashing), and analyze the time and space complexity of your chosen approach. 3. Specify which distance metric you assume — Euclidean on a plane vs. great-circle distance on latitude/longitude — and explain how the choice affects correctness and your data structures. 4. Handle the edge cases: tie-breaking by smallest `id`, duplicate courier locations, and floating-point precision issues when comparing distances. 5. Follow-ups: - How does your approach change if courier locations update frequently (insert, delete, move)? - How would you support large batches of customer queries efficiently? - How would you enforce a constraint such as a maximum pickup radius (a customer with no courier within the radius returns "no courier")?

Quick Answer: A DoorDash software engineer onsite coding question on nearest-neighbor search: for every customer, find the nearest courier by distance, breaking ties by smallest id. It tests brute force vs. spatial indexing (uniform grid, k-d tree, R-tree), time/space complexity, Euclidean vs. great-circle distance, and follow-ups on dynamic courier updates, batch queries, and a maximum pickup radius.

Solution

## Setup and key decisions This is a **nearest-neighbor (NN) search** problem: for each of `n` customers, find the closest of `m` couriers, breaking ties by smallest courier `id`. **Distance metric.** Compare **squared** distances, never the raw distance, to avoid `sqrt` cost and the precision loss it introduces: ``` dist2(c, d) = (c.x - d.x)^2 + (c.y - d.y)^2 ``` For real DoorDash geography the points are latitude/longitude and the correct metric is the **great-circle (haversine)** distance, not planar Euclidean. State your assumption explicitly. If the city is small you can project to a local planar frame (e.g. equirectangular with a `cos(latitude)` scale on longitude) and use Euclidean as a fast approximation; for correctness at scale use haversine. Most grid/k-d-tree structures still work because haversine is monotone in the chord length over a city-sized region. ## Part 1 - Brute force For every customer, scan all couriers and keep the minimum, breaking ties by smallest `id`. ```python def nearest_courier_bruteforce(customers, couriers): # customers: list of (x, y) # couriers: list of (a, b, id) result = [] for (cx, cy) in customers: best_d2 = None best_id = None for (ax, ay, cid) in couriers: d2 = (cx - ax) ** 2 + (cy - ay) ** 2 if best_d2 is None or d2 < best_d2 or (d2 == best_d2 and cid < best_id): best_d2 = d2 best_id = cid result.append(best_id) return result ``` - **Time:** `O(n * m)`. - **Space:** `O(1)` extra (besides the `O(n)` output). At `n = m = 1e5` this is `1e10` operations - far too slow. We need spatial indexing. ## Part 2 - Scalable approach Build a spatial index over the **couriers** once, then query it for each customer. **Option A - Uniform grid / spatial hashing (simplest, usually fastest in practice).** 1. Pick a cell size `s` near the typical NN distance, e.g. `s ≈ sqrt(area / m)` so each cell holds ~1 courier on average. 2. Bucket every courier into cell `(floor(a/s), floor(b/s))` using a hash map keyed by cell. 3. For a customer, search its own cell, then expand to surrounding rings of cells. Once you have a candidate at distance `r`, you only need to keep expanding until the ring's minimum possible distance exceeds `r`, then stop. - **Time:** `O(m)` to build, expected `O(1)` per query for roughly uniform data, so `O(n + m)` overall. Degrades toward `O(m)` per query under heavy clustering. - **Space:** `O(m)`. **Option B - k-d tree.** 1. Build a 2-d k-d tree over the couriers: `O(m log m)`. 2. Each NN query is `O(log m)` average (`O(m)` worst case for adversarial/degenerate data). - **Time:** `O(m log m)` build + `O(n log m)` queries. - **Space:** `O(m)`. **Option C - R-tree.** Better than a k-d tree when couriers are inserted/deleted dynamically and for range/radius queries; comparable query asymptotics, higher constants and implementation complexity. **Recommendation:** a uniform grid for roughly uniform density (common within a city) because of its simplicity and `O(1)`-expected queries; a k-d tree when density varies widely; an R-tree when you need frequent dynamic updates and radius queries. ## Part 4 - Edge cases - **Tie-breaking:** when two couriers are at equal (squared) distance, return the smaller `id`. With a grid or k-d tree, carry `(d2, id)` and compare lexicographically on `(d2, id)`; this guarantees deterministic output regardless of traversal order. - **Duplicate locations:** multiple couriers at the same coordinates are fine - they fall in the same grid cell / tree leaf; tie-breaking by `id` selects deterministically. - **Precision:** compare squared integer/Decimal distances when inputs are integers (exact). With floats, equal-distance ties are fragile; use an epsilon band (`abs(d2a - d2b) <= eps`) before applying the `id` tie-break, or scale to fixed-point. Avoid `sqrt` entirely except for the final reported value. - **No couriers:** return a sentinel (e.g. `-1`/`None`) per customer. ## Part 5 - Follow-ups **Frequent updates (insert / delete / move).** A static k-d tree is poor for updates (it gets unbalanced and needs periodic rebuilds). Prefer: - A **uniform grid / spatial hash**: insert and delete are `O(1)`; a move is delete-from-old-cell + insert-into-new-cell. This is the standard real-time choice for moving fleets. - An **R-tree** (or R*-tree) if you also need range queries with logarithmic updates. - For very high update rates, consider geohash buckets in a key-value store (e.g. Redis `GEO` commands) so updates and radius queries are distributed. **Batch queries.** When many customers query at once, bulk-process them: sort/bucket customers by grid cell so couriers loaded for one cell are reused across nearby customers (cache locality), or build a k-d tree over couriers and run queries in parallel - the index is read-only so queries shard cleanly across threads/workers. **Maximum pickup radius `R`.** This becomes a bounded NN / radius query: only consider couriers within `R`. With a grid, restrict ring expansion to cells overlapping the disk of radius `R` and stop early; with an R-tree, run a range query on the `R`-disk's bounding box then filter. If no courier lies within `R`, return "no courier" for that customer. The radius cap also speeds queries because it caps how far you must search.

Explanation

Rubric: (1) a correct brute force with the right tie-break (smallest id) and O(n*m) analysis; (2) a spatial index over couriers - uniform grid (O(n+m) expected), k-d tree (O((n+m) log m)), or R-tree - with honest worst-case caveats; (3) an explicit distance-metric choice (Euclidean projection vs. great-circle/haversine for real lat/lng) and use of squared distance to avoid sqrt; (4) deterministic handling of ties, duplicate points, and float precision; (5) follow-ups that pick a grid/R-tree for dynamic fleet updates, batch queries by spatial locality, and reduce the max-radius case to a bounded range query. Strong candidates compare distances squared and reason about clustering degrading per-query cost.

Related Interview Questions

  • Validate a Shopping Cart - DoorDash (medium)
  • Calculate Driver Payments - DoorDash (medium)
  • Implement Timeout Refund Workflow - DoorDash (medium)
  • Maximize Chef Assignment Profit - DoorDash (medium)
  • Compute Courier Delivery Pay - DoorDash (easy)
DoorDash logo
DoorDash
Aug 8, 2025, 12:00 AM
Software Engineer
Onsite
Coding & Algorithms
5
0
Question

You are given two sets of 2D points: customers C = {c1..cn} and active couriers (dashers) D = {d1..dm}. Each customer i has coordinates (xi, yi). Each courier j has coordinates (aj, bj) and an integer id. For every customer, return the id of the nearest courier by distance; on ties (equal distance), return the smallest id.

Work through the following parts:

  1. Implement a correct brute-force solution and state its time and space complexity.
  2. Design and implement (or outline) an approach that beats the O(n * m) brute force when n and m are large (up to 1e5 ). Discuss appropriate spatial data structures such as a k-d tree, R-tree, or uniform grid (spatial hashing), and analyze the time and space complexity of your chosen approach.
  3. Specify which distance metric you assume — Euclidean on a plane vs. great-circle distance on latitude/longitude — and explain how the choice affects correctness and your data structures.
  4. Handle the edge cases: tie-breaking by smallest id , duplicate courier locations, and floating-point precision issues when comparing distances.
  5. Follow-ups:
    • How does your approach change if courier locations update frequently (insert, delete, move)?
    • How would you support large batches of customer queries efficiently?
    • How would you enforce a constraint such as a maximum pickup radius (a customer with no courier within the radius returns "no courier")?

Solution

Show

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More Coding & Algorithms•More DoorDash•More Software Engineer•DoorDash Software Engineer•DoorDash Coding & Algorithms•Software Engineer Coding & Algorithms
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.