Solve library SQL and Python tasks
Company: Meta
Role: Data Engineer
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Technical Screen
You are given a library domain. Assume these tables:
- books(book_id, author_id, title)
- authors(author_id, name)
- copies(copy_id, book_id, condition) -- condition in {'good','fair','poor'}
- checkouts(checkout_id, copy_id, member_id, checkout_date, return_date, renew_count)
- members(member_id, name, referrer_member_id) -- referrer_member_id may be NULL
- reservations(reservation_id, member_id, book_id, reserve_date)
Write SQL for the following:
1) Return two metrics in one row:
(a) total_active_good = count of checkouts where return_date IS NULL and the associated copy's condition = 'good';
(b) pct_renew_gt_2 = among those active-good checkouts, the percentage whose renew_count > 2 (as a decimal rounded to 2 decimals). If the denominator is 0, return 0.00.
2) Among books that have more than 10 copies (counted from copies), compute for each book the maximum completed-checkout duration in days using return_date - checkout_date (ignore rows with return_date IS NULL). Return the top 3 books by this maximum duration, with columns (book_id, max_duration_days). Break ties by book_id ASC.
3) For each referred member m (members.referrer_member_id IS NOT NULL), compute the absolute difference between the number of reservations made by m and the number made by their referrer r. Return the row with the largest absolute difference, with columns (member_id, referrer_member_id, reservations_m, reservations_r, abs_diff). Break ties by member_id ASC.
Python tasks:
4) Given a list of tuples like [('category1',
4), ('category2',
6), ...], implement summarize(scores) that returns two integers: (total_sum, top3_max_sum) where total_sum is the sum of all scores; to compute top3_max_sum, first take the maximum score per category, then sum the top 3 category maxima (if fewer than 3 categories, sum what's available). Break ties by category name ASC when selecting top 3.
5) You receive a list of LogEntry objects where True means checkout and False means return:
class LogEntry:
def __init__(self, book_id: int, is_checkout: bool):
self.book_id = book_id
self.is_checkout = is_checkout
Implement is_valid_checkout_log(logs) -> bool that validates the sequence with these rules for each book_id:
(a) a return cannot occur before the first checkout;
(b) two consecutive checkouts without an intervening return are invalid;
(c) two consecutive returns are invalid. The check should run in O
(n) time and O
(n) space.
Quick Answer: This question evaluates proficiency in advanced data manipulation and algorithmic reasoning across SQL and Python, covering aggregation, joins, filtering, windowing, date arithmetic, grouping, edge-case handling, and linear-time sequence validation for event logs.