ByteDance Data Scientist Interview Questions
Master your tech interview with our curated database of real questions from top companies.
Count buggy vs non-buggy by employer
Count buggy vs non-buggy submissions for each employer_id, including employers with zero submissions. Return employer_id, buggy_count, non_buggy_count...
Maximize products bought under budget
Given N products and M customers, for each customer find the list of distinct products they can buy without exceeding their budget such that the numbe...
Design recommendations objective balancing growth and monetization
Design a Multi-Objective Recommender for Long-Form Content You are designing the ranking objective and measurement plan for a long-form content recomm...
Design an interference-robust A/B test for monetization
A/B Test Design: New Tipping UI on Creator Posts Context: You are launching a new tipping UI on creator (PGC/OGC) posts to increase creator monetizati...
Compute and rank top bad advertisers
SQL on ad safety. Assume the following schema and sample rows. Use ANSI SQL. Today is 2025-09-01; interpret “last 7 days” as 2025-08-26 00:00:00 to 20...
Model overdispersed counts; estimate treatment lift
Weekly posts per creator are overdispersed and zero‑inflated. In a creator‑level randomized test of a nudge: - Control: n_c=40,000 creators, total pos...
Plan DS approach for biker delivery project
You are a Data Scientist supporting a “biker” (delivery rider) product/project for a food-delivery platform. An interviewer gives only a short descrip...
Write monthly customer and sales SQL queries
You are analyzing a food-delivery marketplace. Tables Assume the following schema (you may add minor helper CTEs as needed): orders - order_id (BIGINT...
Define and critique a user activity metric
Context You are on a product team and need to define a metric for user activity to be used in dashboards and decision-making. Question 1. Propose 2–4 ...
When prioritize precision vs recall
Context You are working on a product team and building (or evaluating) a binary classifier that triggers an action (e.g., show a warning, block conten...
How do you choose a classification threshold?
Context You built a binary sentiment classification model (e.g., positive vs. negative) and need to deploy it in a product where actions depend on the...
Explain Type I/II errors vs precision/recall
Questions 1. Define Type I error and Type II error in hypothesis testing, and map them to false positives and false negatives. 2. Explain how Type I/I...
Find top-paid employee per department
Tables Assume the company stores employee compensation by department assignment. employee_dept_salary - employee_id INT - employee_name VARCHAR - depa...
Design and decompose Trust & Safety risk metrics
You are a Data Scientist in a Trust & Safety team for a short-video platform (similar to TikTok/Reels). The team asks: “How would you design risk metr...
How would you manage precision/recall for fraud detection?
Scenario You own (or significantly contribute to) a production fraud detection system that flags transactions/users as fraud vs legit. - The model out...
Select max-discount product per category
You have a catalog of products. For each category, return exactly one product: the one with the largest absolute discount; if multiple products in the...
Implement streaming SRM detector with late events
Implement a streaming detector for sample ratio mismatch (SRM) across many concurrent experiments. Input is two topic-partitioned streams: assignments...
Demonstrate leadership in cross-functional disagreement
Behavioral & Leadership (HR Screen, Data Scientist) Prompt Describe a time you disagreed with a partner team (e.g., product pushing for more aggressiv...
Write SQL for 7-day geo-localized revenue dashboard
Write a single SQL query (assume PostgreSQL; tz_offset is an integer hour offset from UTC) to compute a 7-day dashboard by local user date for US vs A...
Compute cluster-aware significance and sequential corrections
Cluster-Randomized Tipping UI Experiment: Power, Sequential Testing, and Multiplicity Context: A creator-level (cluster) randomized experiment evaluat...