PracHub
QuestionsPremiumLearningGuidesInterview PrepNEWCoaches

Quick Overview

This question evaluates data manipulation skills around time-based joins, event attribution, deduplication, and conversion metric computation within the Data Manipulation (SQL/Python) domain.

  • Medium
  • Meta
  • Data Manipulation (SQL/Python)
  • Data Scientist

Join datasets and compute conversion by assignment

Company: Meta

Role: Data Scientist

Category: Data Manipulation (SQL/Python)

Difficulty: Medium

Interview Round: Technical Screen

You are given two CSVs. Create tables and write SQL to produce both visit-level and visitor-level conversion datasets, then aggregate conversion by assignment and country. Use the following schema and sample data. Schema: - visit(id_visitor BIGINT, ts TIMESTAMP, country STRING, assign TINYINT) - booking(id_booking BIGINT, id_visitor BIGINT, ts TIMESTAMP) Sample tables (timestamps are UTC): visit +------------+---------------------+---------+--------+ | id_visitor | ts | country | assign | +------------+---------------------+---------+--------+ | 101 | 2025-01-03 09:12:00 | US | 1 | | 101 | 2025-01-05 10:00:00 | US | 1 | | 102 | 2025-01-04 14:30:00 | CA | 0 | | 103 | 2025-01-04 15:00:00 | US | 1 | | 104 | 2025-01-06 08:00:00 | GB | 0 | | 105 | 2025-01-06 09:10:00 | US | 0 | +------------+---------------------+---------+--------+ booking +------------+------------+---------------------+ | id_booking | id_visitor | ts | +------------+------------+---------------------+ | 5001 | 101 | 2025-01-05 12:00:00 | | 5002 | 102 | 2025-01-04 16:00:00 | | 5003 | 103 | 2025-01-10 09:00:00 | | 5004 | 101 | 2025-01-03 08:00:00 | | 5005 | 105 | 2025-02-01 10:00:00 | +------------+------------+---------------------+ Requirements: 1) Visit-level dataset: one row per visit with columns (id_visitor, visit_ts, country, assign, booked_flag). booked_flag=1 if there exists a booking for the same id_visitor with booking.ts >= visit.ts and < min(next_visit.ts, visit.ts + INTERVAL 28 DAY); otherwise 0. Ensure a single booking is not double-counted across multiple visits for the same visitor. 2) Visitor-level dataset: one row per visitor with columns (id_visitor, first_visit_ts, country_at_first_visit, assign_at_first_visit, booked_flag_28d). booked_flag_28d=1 if any booking.ts is in [first_visit_ts, first_visit_ts + 28 days); otherwise 0. If a visitor has conflicting assign values across visits, use the earliest observed assign. 3) Aggregations: for each of the two datasets, output counts by (assign, country): visits_or_visitors, bookers, conversion = bookers / visits_or_visitors. Be explicit about handling duplicates and timezone assumptions. Provide ANSI SQL (CTEs allowed) that runs on a typical data warehouse (e.g., BigQuery/Snowflake/Postgres) and produces the specified aggregations.

Quick Answer: This question evaluates data manipulation skills around time-based joins, event attribution, deduplication, and conversion metric computation within the Data Manipulation (SQL/Python) domain.

Last updated: Mar 29, 2026

Loading coding console...

PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Compute ad impression conversion rates - Meta (medium)
  • Count unconnected posts and reactions - Meta (medium)
  • Count heavy callers in 7 days - Meta (medium)
  • Write SQL for call metrics - Meta (medium)
  • Write SQL for multi-account metrics - Meta (medium)