Fix dash dates and aggregate watch time
Company: Capital One
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Take-home Project
You receive a CSV of ad viewing logs where the date column repeats only on the first row of each block and subsequent rows use a single dash '-' to indicate 'same as above'. Columns: date (YYYY-MM-DD or '-'), time_slot (e.g., '08:00-09:00'), ad_id, watch_seconds (INT). Tiny excerpt:
+------------+-----------+-------+---------------+
| date | time_slot | ad_id | watch_seconds |
+------------+-----------+-------+---------------+
| 2008-01-01 | 08:00-09:00 | A1 | 120 |
| - | 08:00-09:00 | A2 | 90 |
| - | 09:00-10:00 | A1 | 150 |
| 2008-01-08 | 08:00-09:00 | A1 | 80 |
+------------+-----------+-------+---------------+
Task A (Excel):
1) Provide the exact Excel formula to forward-fill the date in cell A2 and down the column, replacing '-' with the most recent actual date above, without using VBA or Power Query.
2) Then, using only worksheet functions, compute the time_slot with the largest total watch_seconds for the week 2008-01-01 to 2008-01-07 inclusive. Give the precise formulas (e.g., SUMIFS-based) and explain how you ensure the date filter is correct and dash rows are included after forward-fill.
3) Provide a pivot-table approach that yields the same answer and state the two field placements and the exact date filter you would set.
Task B (pandas alternative): Write concise pandas code (no more than 5 lines) to: read the CSV; forward-fill the date column; filter to 2008-01-01 through 2008-01-07; group by time_slot; compute total watch_seconds; and return the single time_slot with the maximum total. Be explicit about handling '-' values.
Quick Answer: This question evaluates data-cleaning and aggregation competencies, focusing on handling nonstandard missing-date markers, forward-filling in spreadsheets, correct date-range filtering, pivot-table summarization, and concise pandas grouping and aggregation.