PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/Machine Learning/Instacart

Improve low R² without p‑hacking

Last updated: Mar 29, 2026

Quick Overview

This question evaluates competence in statistical modeling and causal inference, covering regression diagnostics, feature engineering and interactions, appropriate error distributions and link functions, leakage detection, model selection and validation, and the trade-off between predictive accuracy and valid effect estimation.

  • hard
  • Instacart
  • Machine Learning
  • Data Scientist

Improve low R² without p‑hacking

Company: Instacart

Role: Data Scientist

Category: Machine Learning

Difficulty: hard

Interview Round: Onsite

A linear regression predicting contribution per order has R²=0.07. a) List concrete steps to raise predictive performance without invalidating inference: feature transformations (e.g., splines for basket size), interactions (treatment×daypart), appropriate error distribution (Gamma/log link), and leakage checks. b) Will simply adding another covariate reliably increase R² out‑of‑sample? Demonstrate with CV why/why not and propose alternatives (GAMs, quantile regression, gradient boosting) while keeping effect‑estimation goals in mind. c) Show how you’d use nested cross‑validation and target‑leakage tests to guard against p‑hacking while iterating. d) Explain when a low R² is acceptable for unbiased ATE but unacceptable for accurate individual predictions.

Quick Answer: This question evaluates competence in statistical modeling and causal inference, covering regression diagnostics, feature engineering and interactions, appropriate error distributions and link functions, leakage detection, model selection and validation, and the trade-off between predictive accuracy and valid effect estimation.

Related Interview Questions

  • Explain Core ML Concepts - Instacart (hard)
  • Contrast Lasso vs Ridge trade‑offs - Instacart (hard)
Instacart logo
Instacart
Oct 13, 2025, 9:49 PM
Data Scientist
Onsite
Machine Learning
3
0

Predicting Contribution per Order with Low R²

Context

You are modeling contribution per order (a continuous per-order outcome such as margin or profit contribution) using a linear regression. The current model achieves R² = 0.07, indicating weak predictive performance. You care about both prediction accuracy and valid inference on key covariates (e.g., treatment effects, policy variables).

Tasks

(a) List concrete, practical steps to raise predictive performance without invalidating inference. Include:

  • Feature transformations (e.g., splines for basket size).
  • Interactions (e.g., treatment × daypart).
  • Appropriate error distribution/link (e.g., Gamma with log link) and when to use them.
  • Systematic leakage checks.

(b) Will simply adding another covariate reliably increase R² out-of-sample? Use cross-validation (CV) to demonstrate why or why not, and propose alternatives (GAMs, quantile regression, gradient boosting) that balance predictive performance with effect-estimation goals.

(c) Show how to use nested cross-validation and target-leakage tests to guard against p-hacking while iterating on features/hyperparameters.

(d) Explain when a low R² is acceptable for an unbiased average treatment effect (ATE) but unacceptable for accurate individual predictions.

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More Instacart•More Data Scientist•Instacart Data Scientist•Instacart Machine Learning•Data Scientist Machine Learning
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.