What does the TikTok Data Scientist interview process look like?

Based on candidate reports compiled in this guide, the TikTok Data Scientist loop typically includes 3 stages: Technical Screen, Onsite, Take-home Project. Each stage covers a distinct set of topics walked through in detail above.

What topics does TikTok focus on in Data Scientist interviews?

TikTok Data Scientist interviews cover Data Manipulation (SQL/Python), Analytics & Experimentation, Statistics & Math, Machine Learning, Behavioral & Leadership. The guide above breaks each topic down into core concepts, worked examples, and the real questions candidates were asked.

How many real TikTok Data Scientist interview questions are in this guide?

This guide is anchored to 27 real TikTok Data Scientist interview questions sourced from candidate reports, each linked to a full practice page with starter code, solution discussion, and community comments.

TikTok Data Scientist Interview Prep Guide

Everything TikTok actually asks Data Scientist candidates — concept walkthroughs, worked examples, and the real interview questions, drawn from candidate reports. Free to read.

TikTok Data Scientist Interview Cheatsheet cover

Technical Screen

Data Manipulation (SQL/Python)

SQL Window Functions And Analytical Querying — covered in depth under Take-home Project below.

Analytics & Experimentation

Cohort, Retention, Funnel And Product Metrics — covered in depth under Onsite below.
A/B Testing And Experiment Design — covered in depth under Onsite below.

Statistics & Math

Experiment Diagnostics, Power And Robust Inference — covered in depth under Onsite below.
Propensity Score Matching, DiD And Causal Inference — covered in depth under Onsite below.

Machine Learning

Recommendation, Ads Ranking And Marketplace Objectives — covered in depth under Onsite below.
Classification Thresholds, Imbalanced Learning And Risk — covered in depth under Onsite below.
Supervised ML Workflows, Interpretability And Deployment — covered in depth under Onsite below.

Behavioral & Leadership

Behavioral Communication And Stakeholder Leadership — covered in depth under Onsite below.

Onsite

Data Manipulation (SQL/Python)

SQL Window Functions And Analytical Querying — covered in depth under Take-home Project below.

Analytics & Experimentation

Cohort, Retention, Funnel And Product Metrics

Hierarchical metric tree: top-line Revenue & Engagement branching to Cohort & denominator, Retention types, Funnel units & ordering, and Revenue decomposition (DAU×sessions×impressions×fill×eCPM).

What's being tested

Interviewers are probing whether you can turn messy user-event data into defensible cohort, retention, funnel, and product metric conclusions. For TikTok, this matters because small changes in posting, viewing, shopping, or ad-load behavior can affect creator supply, viewer engagement, and monetization simultaneously. A strong Data Scientist should define the right denominator, align events in time, segment users meaningfully, and distinguish metric movement from causal impact. Expect to explain both the analysis logic and the business interpretation, not just write a query.

Core knowledge

Cohort analysis groups users by a shared starting condition, usually first registration, first post, first purchase, or first exposure date. The most common cohort key is DATE(MIN(event_ts)) per user_id; always clarify whether the cohort is based on signup, first activity, or experiment assignment.
Retention rate measures whether users return after an anchor event. For day- $N$ retention:
$D_N\ retention = \frac{\#\ users\ active\ on\ cohort\ date + N}{\#\ users\ in\ cohort}$
Clarify whether “active” means opening the app, watching, posting, liking, purchasing, or any qualifying event.
Rolling retention and bounded retention answer different questions. Day-7 exact retention counts users active exactly on day 7, while 7-day rolling retention counts users active any time from day 1 through day 7. TikTok-style engagement analyses often need both because habit formation and occasional usage behave differently.
Funnel analysis tracks ordered conversion through steps such as view_item → add_to_cart → purchase or video_view → profile_visit → follow. Define whether the funnel is user-level, session-level, or item-level; conversion rates change dramatically depending on that unit.
Temporal ordering is essential in funnels. Use event timestamps and patterns like ROW_NUMBER() OVER (PARTITION BY user_id, product_id ORDER BY event_ts) to deduplicate repeated events, then require step $k+1$ to occur after step $k$ . Counting unordered events creates inflated conversions.
Metric decomposition separates topline movement into interpretable components. For ad revenue, a useful identity is:
$Revenue = DAU \times sessions/user \times impressions/session \times fill\ rate \times eCPM$
This lets you diagnose whether revenue changed because of traffic, ad load, auction pricing, or engagement.
DAU growth vs monetization tradeoffs require both guardrail and objective metrics. A growth feature might increase DAU but reduce ARPDAU, session depth, or creator posting. A monetization change might increase ad_revenue but hurt watch_time, retention, or long-term user value.
Segmentation is not optional. Retention and funnel behavior differ by new vs existing users, creator vs viewer, geography, device, traffic source, content vertical, and user maturity. A flat average can hide Simpson’s paradox, especially when traffic mix shifts across regions or acquisition channels.
Causal inference matters whenever you interpret metric changes. If a cohort has higher retention after a launch, ask whether it was exposed to a treatment, acquired through a different channel, or affected by seasonality. Prefer randomized A/B tests; otherwise consider difference-in-differences, matching, or regression adjustment.
Censoring and incomplete windows can silently bias retention. If today is 2026-05-23, users acquired on 2026-05-20 cannot yet have day-7 retention. Exclude immature cohorts from day- $N$ calculations or mark them incomplete instead of treating missing activity as non-retention.
Statistical uncertainty should accompany metric reads. For a binary retention metric, a rough standard error is $\sqrt{p(1-p)/n}$ ; for funnel rates, compare proportions with confidence intervals or logistic regression. With very large samples, focus on practical significance, not only tiny p_values.
Deduplication and event definition are analysis-layer responsibilities. If users can post multiple times or fire duplicate click events, decide whether to count distinct users, distinct sessions, or total events. For example, COUNT(DISTINCT user_id) answers active users; COUNT(*) answers total activity volume.

Worked example

For Analyze Trade-off Between DAU Growth and Ad Revenue, a strong candidate would start by clarifying the setup: “Are we evaluating an experiment, an observed trend, or a proposed product change? Are DAU and ad revenue measured globally or by market, and over what time horizon?” Then they would define the primary metrics: DAU, ad_revenue, ARPDAU, retention, watch_time, ad_impressions_per_user, and possibly creator-side metrics if the feature affects posting supply.

The answer should be organized around four pillars. First, decompose revenue into traffic, engagement, ad inventory, fill rate, and price, so the tradeoff is not treated as one black-box number. Second, segment by user maturity, geography, acquisition source, and engagement level to see whether growth comes from low-monetizing or high-retaining users. Third, evaluate causality with an A/B test if available, using DAU or retention as engagement metrics and revenue per eligible user as monetization metrics. Fourth, make a decision framework: launch if long-term engagement gain outweighs short-term revenue loss, but block if guardrails like day-7 retention, session length, or ad fatigue degrade materially.

One explicit tradeoff to flag is short-term versus long-term value. Increasing ad load may raise same-day revenue while reducing future retention; reducing ad load may lower ARPDAU today but increase future LTV. A strong close would be: “If I had more time, I’d estimate cumulative 7-, 14-, and 28-day revenue and retention by cohort, not just same-day DAU, because this decision depends on lifetime value rather than a single-day metric.”

A second angle

For Calculate User Registration Date and 7-Day Retention Rate, the same core concept becomes more operational and cohort-oriented. Instead of debating business tradeoffs, the candidate must define each user’s registration or first activity date, assign them to a cohort, and check whether they performed a qualifying action exactly seven days later or within a defined 7-day window. The main constraint is precision: registration_date, timezone, event type, and inclusion of incomplete cohorts must be handled consistently. The interviewer is likely testing whether you can translate a product definition into a reproducible metric while avoiding denominator leakage. The best answer still includes interpretation: day-7 retention is not just a query output; it tells whether new users are forming a habit.

Common pitfalls

Pitfall: Treating event counts as user counts.

A tempting wrong answer is to compute retention as COUNT(posts_on_day_7) / COUNT(posts_on_day_0). That measures posting volume, not retained users, and heavy posters will dominate the metric. A better answer uses COUNT(DISTINCT user_id) for retention and separately reports posts per retained user if activity intensity matters.

Pitfall: Ignoring time alignment and cohort maturity.

Candidates often include users who have not yet had enough time to reach day 7, which mechanically depresses recent cohort retention. State that you would filter to cohorts with cohort_date <= current_date - interval '7 days' and align timestamps to the product’s reporting timezone before aggregating.

Pitfall: Jumping to a launch recommendation without diagnosing the metric movement.

For a DAU versus ad revenue question, saying “choose revenue because it is business-critical” or “choose DAU because growth matters” is too shallow. A stronger response decomposes the metrics, checks user segments, estimates long-term impact, and frames the decision around objective metrics plus guardrails.

Connections

Interviewers may pivot from here into experiment design, especially choosing primary metrics and guardrails for an A/B test. They may also ask about causal inference, ranking/recommender evaluation, or metric anomaly diagnosis, such as explaining why DAU rose while retention fell.

Evaluate Cohort Posting Patterns Using Metrics and Tests

Evaluates metric design, causal reasoning, experiment setup, diagnostics, SQL/statistical checks, and recommendations in a realistic interview setting...

TikTok Data Scientist Interview Prep Guide

Technical Screen

Data Manipulation (SQL/Python)

Analytics & Experimentation

Statistics & Math

Machine Learning

Behavioral & Leadership

Onsite

Data Manipulation (SQL/Python)

Analytics & Experimentation

What's being tested

Core knowledge

Worked example

A second angle

Common pitfalls

Connections

Further reading

Evaluate Cohort Posting Patterns Using Metrics and Tests

Define and critique a user activity metric

Evaluate Home-Feed Diversity's Impact on User Engagement Metrics

What's being tested

Core knowledge

Worked example

A second angle

Common pitfalls

Connections

Further reading

Design robust A/B test with interference and seasonality

Design A/B Test for Cost-Per-Conversion Efficiency Analysis

Design an experiment for exploratory recommendations

Statistics & Math

What's being tested

Core knowledge

Worked example

A second angle

Common pitfalls

Connections

Further reading

Compute cluster-aware significance and sequential corrections

Act when A/B result is not significant

Test Billboard Campaign Conversion Rate Exceeds 60%

What's being tested

Core knowledge

Worked example

A second angle

Common pitfalls

Connections

Further reading

Control confounding in observational ad lift

Use DiD for staggered treatment adoption

Model overdispersed counts; estimate treatment lift

Machine Learning

What's being tested

Core knowledge

Worked example

A second angle

Common pitfalls

Connections

Further reading

Design recommendations objective balancing growth and monetization

Design an ad-selection system across objectives

Compare Random Forests and Boosted Trees: Bias, Variance, Speed

What's being tested

Core knowledge

Worked example

A second angle

Common pitfalls

Connections

Further reading

How do you choose a classification threshold?

When prioritize precision vs recall

Design Real-Time Credit Card Fraud Detection System

What's being tested

Core knowledge

Worked example

A second angle

Common pitfalls

Connections

Further reading

Predict Customer Churn with Machine Learning Workflow