Analyze TSV File for User Page Visits and Patterns
Company: Apple
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Technical Screen
visits
+-----------+-----------+------+
| person_id | timestamp | page |
+-----------+-----------+------+
| 1 | 100 | A |
| 1 | 110 | B |
| 1 | 150 | C |
| 2 | 100 | B |
| 2 | 120 | C |
+-----------+-----------+------+
##### Scenario
You receive a TSV file in which each line contains a user’s chronological page-visit history formatted as timestamp,page and separated by “/t”. Business wants insights on usage and performance optimizations.
##### Question
Parse the file and return the page with the highest total visit count.
2) For every visit, compute the residence time (current timestamp – next timestamp). Return the page with the greatest total residence time across all users.
3) Treat each user’s ordered page sequence as a path. Return the most frequent complete path (e.g., "A→B→C").
4) #2 can be slow with explicit loops. Rewrite the residence-time computation so it can execute in parallel / vectorized form (e.g., time[i] – time[i-1]) and explain the performance benefit.
##### Hints
Load into pandas, sort by person_id & timestamp, use groupby + diff/shift, Counter or groupby agg, and vectorized numpy operations for parallelism.
Quick Answer: This question evaluates skills in parsing and manipulating time-series user-event data, performing aggregations and path-frequency analysis, and understanding vectorized or parallel computations for performance.