Find returning users from access logs
Company: Amazon
Role: Software Engineer
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Onsite
Given a large user access log, parse it and identify which user_ids are returning customers—i.e., they have at least one visit on two or more distinct calendar days. Assume log lines contain an ISO 8601 timestamp, user_id, and URL; if the format is different, state your assumptions. Implement a solution in Python or SQL. Address: time‑zone normalization, deduping multiple hits on the same day, memory for large files (streaming vs batch), and complexity. Provide unit tests and example input/output.
Quick Answer: This question evaluates a candidate's ability to parse and aggregate access logs, handle date/time normalization and deduplication across calendar days, and design memory-efficient, scalable solutions using SQL or Python.