Thumbtack Interview Questions
Practice the exact questions companies are asking right now.
Detail NLP preprocessing and n‑gram choices
Describe your text preprocessing pipeline given the source modality: typed text, scanned/handwritten OCR, or speech-to-text. Specify language handling...
Design cross-validation; explain bias–variance
Define cross-validation rigorously and compare k-fold, stratified k-fold, leave-one-out, nested CV, and time-series rolling/blocked CV. For a dataset ...
Choose clustering vs regression; explain KNN
When would you use clustering vs. regression on a business problem with partially labeled outcomes? Specify the decision criteria (label availability,...
Implement min, mean, median robustly
Implement three functions in Python without using numpy/pandas: (1) my_min(nums) returning the minimum in O(n) time and O(1) space; (2) my_mean(nums) ...
Compare list/dict; parse JSON/CSV at scale
Compare Python list and dict precisely: for append/insert/lookup/update/delete, state average and worst-case time complexity, memory implications, and...
Explain a project and justify choices
Walk me through your most impactful project end-to-end: what problem and success metric did you define, what alternatives did you evaluate and reject,...
Design streaming new-vs-returning monthly metrics
Streaming design: Monthly NEW vs RETURNING request shares (event-time, with late/out-of-order and duplicates) Context You receive a high-volume event ...
Optimize red-ball draw probability, prove optimality
Two-Box Ball Allocation to Maximize Probability of Drawing Red Setup - You have 2 boxes and two colors of balls. - In the 100/100 case: 100 red and 10...
Demonstrate rapid analysis and stakeholder debrief
Rapid Analysis and Stakeholder Debrief Plan You have 1 hour to analyze a provided dataset (no pre-read) followed by a 45-minute debrief with a product...
Write monthly new-vs-returning requests SQL
Given the schema and sample data below, write a single PostgreSQL query (no dynamic SQL) that returns, for every calendar month present in requests, t...
Explain power drivers and resolve unexpected A/B results
A/B Testing: Power, Sample Size, Allocation, and Diagnostics You are analyzing a two-proportion (binary conversion) A/B test with independent users, n...
Compute weekly 3-week rolling sums in SQL
Using PostgreSQL, write a single query that outputs, for each calendar week in a given range, the sum of amounts in that week and a rolling sum over t...
Lead XFN decision under tight timeline
Scenario: 72-Hour VP-Level Recommendation on Expanding a New Quoting Workflow You have 72 hours to deliver a VP-level deck recommending whether to exp...
Define success metrics for Instant Book
Instant Book: Metrics, Measurement, Rollout, and Risk Plan Context You are evaluating an "Instant Book" feature that allows customers to immediately b...
Design a robust pro-ranking A/B test
Experiment Design: Evaluating a New Pro Ranking Algorithm (Ranker) in a Two‑Sided Marketplace You are designing an experiment to evaluate a new pro ra...
Write complex joins and window functions
You are given a simplified Thumbtack-like marketplace schema in PostgreSQL. Assume UTC timestamps and weeks start on Monday. Treat "today" as 2025-09-...
Build a defensible ML pipeline end-to-end
End-to-End Binary Classification Pipeline on Tabular Data (Numeric, Categorical, Text) Context You are handed a tabular dataset that includes numerica...
Design and evaluate an A/B test for launch
A/B Test Design: New Matching Model for a Two‑Sided Marketplace Context You are testing a new matching/ranking model that determines which providers a...
Present a DS project with business impact
7-Minute Data Science Project Presentation (Onsite) Context You are interviewing for a Data Scientist role and will present a past project to a mixed ...
Implement TF–IDF with sparse matrices
Implement TF–IDF from Scratch (Python + NumPy/SciPy) You are given a list of documents (strings). Build a TF–IDF vectorizer from scratch with the foll...