Review Python geodata code
Company: Airbnb
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: Medium
Interview Round: Onsite
##### Question
You are given a Python codebase that performs geolocation data processing. Conduct a thorough code review: identify readability, correctness, performance, and scalability issues, and propose concrete refactors or optimizations (e.g., data structures, vectorization, caching, modularization).
Quick Answer: This question evaluates proficiency in Python code review and geolocation data-processing, including competencies in readability, correctness, performance optimization, and architectural scalability.
Given a list of GPS points as (latitude, longitude) and an integer precision p, partition the plane into a uniform grid where each cell spans 10^-p degrees in both latitude and longitude. Map each point to its grid cell using integer tile indices i = floor(lat * 10^p + 1e-12) and j = floor(lon * 10^p + 1e-12). Return the number of distinct grid cells visited by the points. If the list is empty, return 0.
Constraints
- 0 <= n <= 200000
- -90.0 <= latitude <= 90.0
- -180.0 <= longitude <= 180.0
- 0 <= precision <= 6
- Quantization uses i = floor(lat * 10^precision + 1e-12), j = floor(lon * 10^precision + 1e-12)
- Return the count of unique (i, j) pairs
Solution
import math
from typing import List, Tuple
def count_distinct_cells(points: list[tuple[float, float]], precision: int) -> int:
scale = 10 ** precision
eps = 1e-12
seen = set()
for lat, lon in points:
i = math.floor(lat * scale + eps)
j = math.floor(lon * scale + eps)
seen.add((i, j))
return len(seen)
Explanation
We quantize each point to an integer grid by scaling by 10^p and flooring with a tiny epsilon to neutralize floating-point artifacts near integers. Each cell is represented as a pair of integers (i, j). Inserting these pairs into a set yields the number of distinct cells. Using integers for cells is cache- and hash-friendly, and avoids ambiguity in floating-point comparisons.
Time complexity: O(n). Space complexity: O(n).
Hints
- Convert each point to integer tile indices and use a set to track uniqueness.
- Precompute scale = 10**precision and avoid Python's round; use math.floor with a tiny epsilon to counter floating-point noise.
- If many identical points appear, caching their tile result can reduce repeated computation.