Write monthly touches and last-touch SQL
Company: Upstart
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Technical Screen
You have two tables tracking marketing touches and downstream conversions. Write SQL to answer the three prompts below. Assume a warehouse like Postgres/BigQuery/Snowflake; months are calendar months in UTC; when multiple touches tie on timestamp, break ties by the highest touch_id; ignore touches strictly after a company's conversion.
Schema:
- marketing_touch(
touch_id BIGINT PRIMARY KEY,
company_id INT NOT NULL,
touch_timestamp TIMESTAMP NOT NULL,
channel VARCHAR,
campaign VARCHAR
)
- conversion(
company_id INT NOT NULL,
conversion_timestamp TIMESTAMP NOT NULL
)
Sample data (minimal, for reasoning/testing):
marketing_touch
| touch_id | company_id | touch_timestamp | channel | campaign |
| 1 | 100 | 2025-01-05 10:00:00 | Email | E1 |
| 2 | 100 | 2025-01-20 12:00:00 | Paid | P1 |
| 3 | 100 | 2025-02-01 09:00:00 | Direct | D1 |
| 4 | 101 | 2025-02-10 08:00:00 | Paid | P2 |
| 5 | 101 | 2025-02-10 08:00:00 | Email | E2 |
| 6 | 102 | 2025-02-28 23:59:59 | Referral | R1 |
conversion
| company_id | conversion_timestamp |
| 100 | 2025-02-10 00:00:00 |
| 101 | 2025-02-10 08:00:00 |
| 103 | 2025-03-01 12:00:00 |
Prompts:
1) For each month (YYYY-MM), output month and avg_touches_per_company = total touches in that month divided by the number of distinct companies that had at least one touch in that same month. Include months present in data only. Be explicit about handling companies with zero touches in a month (exclude them from the denominator).
2) For each company that converted (exists in conversion), return the last touch at or before its conversion_timestamp: company_id, conversion_timestamp, last_touch_timestamp, channel, campaign, and whether the last touch occurred in the same calendar month as the conversion. If a company has no touch at or before conversion, exclude it.
3) Count distinct companies where the last touch (as defined in #2) and the conversion occur in the same calendar month. Bonus: add a second output where you also require DATEDIFF in days between conversion_timestamp and last_touch_timestamp <= 45.
Edge cases to handle in your SQL: (a) multiple touches at the exact same timestamp for a company (pick the one with the highest touch_id); (b) touches after conversion (ignore for #2/#3); (c) companies present in conversion with no prior touches (exclude in #2/#3). Provide performant SQL (CTEs are fine) and briefly explain your tie-break and month-extraction logic.
Quick Answer: This question evaluates proficiency in SQL-based data manipulation and time-based attribution, covering aggregations, joins, window functions, date truncation to calendar months, tie-breaking logic, and handling edge cases like post-conversion touches and duplicate timestamps.