Critique a Linear Regression Workflow

Q: Critique a Linear Regression Workflow

This question evaluates a candidate's understanding of linear regression assumptions, model-selection pitfalls, diagnostics and validation methods, and the distinction between predictive, inferential, and causal goals when modeling skewed outcome data such as website dwell time.

Q: How do I approach Statistics & Math interview questions?

Statistics & Math questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master statistics & math interviews.

Q: What difficulty level is this interview question?

This is a easy difficulty Statistics & Math question, commonly asked during Technical Screen rounds at Apple.

Q: What role is this question designed for?

This question is commonly asked for Data Scientist candidates at Apple during technical interviews.

Question

You are reviewing another data scientist's approach to modeling website dwell time for users who arrive from Google Search.

Response variable: Y , the number of seconds a user stays on the website after clicking through from Google Search.
Candidate predictors: four variables X1 - X4 (their exact definitions are not specified and should be clarified).

The other data scientist used the following process:

They observed that Y appears approximately normally distributed and concluded that ordinary least squares regression is appropriate.
They fit all possible combinations of the predictors, including squared terms and pairwise second-order interactions.
They selected the model with the best fit as the final model.

Critique this workflow. What clarifying questions would you ask before accepting the analysis, and what would you recommend instead?

In your answer, discuss:

whether the goal is prediction, inference, or causal estimation,
which assumptions actually matter for OLS and for valid statistical inference,
why the marginal normality of Y is not one of the Gauss-Markov assumptions,
how dwell-time data can violate standard linear-model assumptions,
the risks of exhaustive interaction search and model-selection bias,
and how you would redesign the modeling and validation process, including model diagnostics, evaluation metrics, cross-validation, and possible alternatives such as transformation, GLMs, regularization, or robust standard errors.

Critique a Linear Regression Workflow

Quick Overview

Solution

Comments (0)