Bytedance Data Scientist Interview Questions
Master your tech interview with our curated database of real questions from top companies.
Count buggy vs non-buggy by employer
Count buggy vs non-buggy submissions for each employer_id, including employers with zero submissions. Return employer_id, buggy_count, non_buggy_count...
Maximize products bought under budget
Given N products and M customers, for each customer find the list of distinct products they can buy without exceeding their budget such that the numbe...
Design recommendations objective balancing growth and monetization
Design a Multi-Objective Recommender for Long-Form Content You are designing the ranking objective and measurement plan for a long-form content recomm...
Design an interference-robust A/B test for monetization
A/B Test Design: New Tipping UI on Creator Posts Context: You are launching a new tipping UI on creator (PGC/OGC) posts to increase creator monetizati...
Compute and rank top bad advertisers
SQL on ad safety. Assume the following schema and sample rows. Use ANSI SQL. Today is 2025-09-01; interpret “last 7 days” as 2025-08-26 00:00:00 to 20...
Model overdispersed counts; estimate treatment lift
Weekly posts per creator are overdispersed and zero‑inflated. In a creator‑level randomized test of a nudge: - Control: n_c=40,000 creators, total pos...
Select max-discount product per category
You have a catalog of products. For each category, return exactly one product: the one with the largest absolute discount; if multiple products in the...
Implement streaming SRM detector with late events
Implement a streaming detector for sample ratio mismatch (SRM) across many concurrent experiments. Input is two topic-partitioned streams: assignments...
Demonstrate leadership in cross-functional disagreement
Behavioral & Leadership (HR Screen, Data Scientist) Prompt Describe a time you disagreed with a partner team (e.g., product pushing for more aggressiv...
Write SQL for 7-day geo-localized revenue dashboard
Write a single SQL query (assume PostgreSQL; tz_offset is an integer hour offset from UTC) to compute a 7-day dashboard by local user date for US vs A...
Compute cluster-aware significance and sequential corrections
Cluster-Randomized Tipping UI Experiment: Power, Sequential Testing, and Multiplicity Context: A creator-level (cluster) randomized experiment evaluat...
Investigate visit–report correlation causality
Causal Diagnosis: Do More Ad Page Visits Cause More Reports? Context You observe a positive correlation between the number of ad page visits and the p...
Analyze shopping funnel with joins and windows
Write SQL (PostgreSQL) to analyze a 4-step shopping funnel: view_product → add_to_cart → checkout_start → purchase. Use the schema and sample data bel...
Rank factors for TikTok market entry
TikTok Market Z Launch Decision Framework (Q4 2025) Context You are a data scientist evaluating whether TikTok should launch in a new country (Market ...
Diagnose a sudden metric spike or drop
Investigate a 3-Day Jump in Checkout Conversion Rate (CCR) Context On 2025-06-12, the daily Checkout Conversion Rate (CCR) increased from 3.2% to 4.5%...
Show ownership in ambiguous creator-growth work
Describe a time you owned an ambiguous growth problem for creators end‑to‑end. Pick one project and cover: 1) the exact business goal and why it matte...
Write SQL for geo posting-frequency drops
Using the schema below, write a single ANSI SQL query (window functions allowed) that identifies countries with the largest share of creators whose po...
Design a creator posting-frequency experiment
You’re on the Creator Growth (PGC) team of a short‑video platform. Product proposes a push/email nudge expected to raise creators’ weekly posting freq...