Design Basketball Shot Outcome Prediction
Company: Virtu
Role: Data Scientist
Category: ML System Design
Difficulty: medium
Interview Round: Technical Screen
Design a machine learning system that predicts whether a basketball shot will be made or missed using only information available at the moment the player releases the ball.
Discuss the problem formulation, data collection, feature design, modeling approach, loss function, evaluation metrics, and feature selection strategy. Example input signals may include:
- Shot-level physics features such as release speed, release angle, and distance to the rim.
- Player identity and player-specific historical behavior, potentially represented with learned embeddings.
- Contextual information such as arena conditions, game state, fatigue, minutes already played, and other available pre-shot context.
Also answer this feature-selection question: if you fit a univariate linear regression of the target against one feature and the feature is not statistically significant, is it safe to discard that feature? Why or why not?
Quick Answer: This question evaluates a candidate's competence in designing end-to-end machine learning systems for sports analytics, covering problem formulation, data collection, feature engineering (physics-based shot metrics, player embeddings, and contextual game state), modeling choices, loss functions, evaluation metrics, and feature selection, and is categorized under ML System Design. It is commonly asked to assess the ability to balance domain-specific feature design with modeling and evaluation trade-offs in applied machine learning, and it probes both conceptual understanding (statistical feature significance) and practical application (system-level design decisions).