Meta Data Engineer Data Manipulation (SQL/Python) Interview Questions
Practice the exact questions companies are asking right now.
Optimize SQL to minimize scans
Given a large analytics query, refactor it to minimize table scans. 1) Replace unnecessary CTEs that cause multiple scans with inline aggregations or ...
Solve Python and SQL data tasks
Complete both tasks: 1) Python: Implement a function flatten(nested) that takes a list whose elements are integers or arbitrarily nested lists of inte...
Compute reservation diff for largest member
Given copies(copy_id, reserved_by_member_id) and members(member_id, referred_by_member_id), find the member with the largest member_id. Return a singl...
Solve library SQL and Python tasks
You are given a library domain. Assume these tables: - books(book_id, author_id, title) - authors(author_id, name) - copies(copy_id, book_id, conditio...
Find top 3 books by total borrowed time
Using copies(copy_id, book_id) and checkouts(copy_id, checkout_date, return_date), compute for each book_id the total borrowed duration as the sum ove...
Find customer with max rentals in consecutive weeks
You are given a table purchases(customer_id INT, purchase_date DATE, rented_copies INT). Consider only dates in calendar year 2024. Define a full week...
Return count and renewal percentage of unreturned good copies
Tables: copies(copy_id, condition), checkouts(copy_id, checkout_date, return_date, renewal_count). Write a single SQL query that returns one row with ...
Tackle Python tasks under time pressure
In a 15-minute coding round, implement a small Python function or class to solve a well-scoped problem within about 5 minutes of coding. 1) State 1–2 ...
Write SQL for library analytics
Given a library database, write SQL to answer: 1) How many books are in good condition and not returned? Among these books, what is the percentage tha...
Count active follow connections
Question Write SQL to return the current number of active follow connections. Events table columns: user_id, target_id, event_type ('request_follow', ...
Query carpool ride metrics
Question For a ride-sharing product with carpool capability, answer a series of SQL questions (e.g., daily completed pooled rides, average seats utili...
Aggregate Netflix metrics in SQL
Question Netflix video-streaming analytics SQL: Write a simple aggregation (e.g., total watch-time per day). Build a cumulative metric: today’s metric...
Return top-3 content per category
Given a collection of items with fields (content_id, category, rating), implement top_k_by_category(items, k= 3) that returns, for each category, the ...
Compute cumulative metrics with full joins
Tables: - daily_metrics(date DATE, content_id STRING, daily_value BIGINT) - cumulative_metrics(date DATE, content_id STRING, cumulative_value BIGINT) ...
Recommend two-hop follows in Python
Given a directed "follows" graph as a Python dict[str, list[str]], implement recommend_two_hop(graph, user) that returns the set (or a sorted list) of...
Write SQL for active follow connections
Table: follow_events(requester_id INT, target_id INT, event STRING CHECK (event IN ('request_follow','follow_success','follow_reject','unfollow')), ev...
Write SQL for library analytics
Given a library database, write SQL to answer the following: 1) Count the number of books that are currently not returned (i.e., still checked out) an...
Write SQL and Python for data prep
Given clickstream events (user_id, event_type, ts, properties) and a users table (user_id, signup_ts, plan), write SQL to compute DAU/WAU/MAU, D1/W1 r...