Assume today is 2025-09-01; interpret 'last week' as 2025-08-25 through 2025-08-31 inclusive, using UTC dates. You have the following schema and sample data.
Schema:
-
friendships(requester_id INT, addressee_id INT, requested_at TIMESTAMP, approved_at TIMESTAMP NULL)
-
users(user_id INT PRIMARY KEY, spam BOOLEAN)
Sample tables (UTC timestamps):
Users
+---------+------+
| user_id | spam |
+---------+------+
| 1 | TRUE |
| 2 | FALSE|
| 3 | FALSE|
| 4 | TRUE |
| 5 | FALSE|
+---------+------+
Friendships
+--------------+--------------+---------------------+---------------------+
| requester_id | addressee_id | requested_at | approved_at |
+--------------+--------------+---------------------+---------------------+
| 2 | 1 | 2025-08-25 10:00:00 | 2025-08-25 18:00:00 |
| 3 | 5 | 2025-08-25 23:30:00 | 2025-08-26 00:05:00 |
| 4 | 2 | 2025-08-26 12:00:00 | NULL |
| 6 | 3 | 2025-08-27 09:00:00 | 2025-08-27 09:15:00 |
| 5 | 3 | 2025-08-27 22:59:00 | 2025-08-28 22:59:00 |
| 2 | 6 | 2025-08-31 01:00:00 | 2025-08-31 01:05:00 |
| 1 | 3 | 2025-08-24 11:00:00 | 2025-08-24 12:00:00 |
| 3 | 2 | 2025-08-30 23:50:00 | 2025-08-30 23:59:59 |
+--------------+--------------+---------------------+---------------------+
Tasks:
-
Write a single SQL query that returns, for each date in 2025-08-25..2025-08-31 (UTC), the columns: day (DATE), same_day_accepts, requests, same_day_accept_rate. Define same-day accept as DATE(requested_at)=DATE(approved_at). Include days with zero requests (i.e., emit 7 rows).
-
Write SQL to compute the percentage of friendship requests created last week where the requester is NOT spam (users.spam=false). Use only requests with requested_at in 2025-08-25..2025-08-31 (UTC). Report the percentage to two decimals.
-
If users is not comprehensive (some requesters are missing from users), write SQL that returns three percentages for last week: (a) known_only_pct (exclude rows where requester not in users), (b) pessimistic_pct (treat all unknown requesters as spam), and (c) optimistic_pct (treat all unknown requesters as not spam). Also return the counts used for each denominator.
-
List at least three edge cases you considered (e.g., NULL approved_at, approvals falling outside the window, duplicate requests between the same pair, timezone cutoffs), and briefly state how your SQL in (1)-(3) handles each. Be explicit about using UTC date boundaries.