PracHub
QuestionsPremiumLearningGuidesInterview PrepNEWCoaches

Quick Overview

This question evaluates skills in time-series data processing, timestamp parsing and time-zone normalization, scalable stream or batch handling for logs that exceed memory, algorithmic complexity analysis, and test-case design for edge conditions such as midnight boundaries and duplicate events.

  • Medium
  • Amazon
  • Coding & Algorithms
  • Software Engineer

Find returning users from access logs

Company: Amazon

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: Medium

Interview Round: Onsite

Given an unordered log of web access events, identify which users are returning (i.e., they have visited on at least two different calendar days). Each event is a tuple (user_id, timestamp_iso 8601). Output the set of user_ids that meet the criterion. Address: 1) parsing and time-zone normalization; 2) handling large logs that don't fit in memory; 3) algorithmic complexity; 4) test cases including edge cases near midnight and duplicate events.

Quick Answer: This question evaluates skills in time-series data processing, timestamp parsing and time-zone normalization, scalable stream or batch handling for logs that exceed memory, algorithmic complexity analysis, and test-case design for edge conditions such as midnight boundaries and duplicate events.

You are given an unordered list of web access events. Each event is [user_id, timestamp_iso8601]. A user is considered returning if they have at least two visits on different UTC calendar dates. The timestamp always includes a timezone (e.g., 'Z' or '+HH:MM'). Normalize each timestamp to UTC before extracting the date. Return the list of user_ids who are returning, sorted in ascending lexicographic order. Ignore duplicate events (same user_id and identical timestamp).

Constraints

  • 0 <= len(events) <= 200000
  • events[i] = [user_id, timestamp_iso8601]
  • user_id is a non-empty string of length <= 64
  • timestamp_iso8601 format: YYYY-MM-DDTHH:MM:SS[.ffffff](Z|±HH:MM)
  • Different days are determined after converting timestamps to UTC
  • Duplicate events do not increase day counts
  • Return user_ids sorted lexicographically

Solution

from datetime import datetime, timezone


def returning_users(events: list[list[str]]) -> list[str]:
    """Return lexicographically sorted user_ids with visits on >= 2 different UTC dates."""
    returning = set()
    seen_dates = {}  # user_id -> set of seen UTC dates (we only keep until 2nd unique date)

    def to_utc_date(ts: str):
        # Normalize 'Z' to '+00:00' for fromisoformat compatibility
        if ts.endswith('Z'):
            ts = ts[:-1] + '+00:00'
        dt = datetime.fromisoformat(ts)
        dt_utc = dt.astimezone(timezone.utc)
        return dt_utc.date()

    for user_id, ts in events:
        if user_id in returning:
            # Already classified as returning; no need to track more
            continue
        d = to_utc_date(ts)
        prev = seen_dates.get(user_id)
        if prev is None:
            seen_dates[user_id] = {d}
        elif d not in prev:
            # Second unique day observed -> returning user
            returning.add(user_id)
            # Free memory for this user; no further tracking needed
            del seen_dates[user_id]
        # else: duplicate day for this user; ignore

    return sorted(returning)
Explanation
Convert each timestamp to a UTC date and count unique UTC dates per user. Maintain a dictionary mapping user_id to the (small) set of dates seen. When a user is observed on a different UTC date than previously recorded, add them to the result and stop tracking them to save memory. Duplicates (same user and timestamp) do not change the set of dates. Finally, return sorted user_ids for deterministic output. This approach can be streamed and does not require loading the entire log into memory at once.

Time complexity: O(n + r log r), where n is the number of events and r is the number of returning users (sorting cost).. Space complexity: O(u), where u is the number of users not yet classified as returning..

Hints

  1. Normalize timestamps to UTC and then take the calendar date (yyyy-mm-dd).
  2. In Python, replace a trailing 'Z' with '+00:00' and use datetime.fromisoformat, then astimezone(timezone.utc).date().
  3. Track per-user unique UTC dates; you only need to know whether a second, different date has been seen.
  4. Use a set for the resulting user_ids and sort before returning.
  5. Process events in a streaming manner to handle large logs without storing them all.
Last updated: Mar 29, 2026

Loading coding console...

PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Count Connected Components in an Undirected Graph - Amazon (medium)
  • Find Unique Target-Sum Pairs - Amazon (easy)
  • Find Valid IP Addresses in Files - Amazon (medium)
  • Implement Optimal Bucket Batching - Amazon (hard)
  • Implement Cache and Rotate Matrix - Amazon (medium)