How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

What difficulty level is this interview question?

This is a hard difficulty Machine Learning question, commonly asked during HR Screen rounds at Amazon.

What role is this question designed for?

This question is commonly asked for Data Scientist candidates at Amazon during technical interviews.

Apply Double ML with text-address features

Quick Overview

This question evaluates a candidate's proficiency in causal inference and Double Machine Learning for estimating average treatment effects from observational data, including representation and validation of address-derived text features, nuisance estimation, overlap diagnostics, sensitivity analysis, subgroup effect reporting, and geographic privacy/fairness concerns; it is categorized under Machine Learning and applied causal inference. It is commonly asked to gauge both conceptual understanding of orthogonalization, sample-splitting and identification assumptions and practical application skills in selecting and validating text/geospatial features, performing overlap/positivity checks, conducting sensitivity analyses, and controlling for multiple comparisons, representing a mix of conceptual understanding and practical implementation.

Estimate the ATE of a First Reminder on CSAT via Double Machine Learning (DML)

Context

You have observational data on customer satisfaction (CSAT) surveys. Some customers received a first reminder to complete the survey; others did not. You want the Average Treatment Effect (ATE) of receiving the first reminder on CSAT, using Double Machine Learning (DML). Addresses are available as text and can be used to construct geospatial features and embeddings.

Assume: (a) treatment assignment is not randomized but depends on observed pre-treatment covariates; (b) CSAT is measured on a numeric scale (e.g., 1–5) for completed surveys; (c) all features used are measured or defined prior to the reminder decision.

Requirements

Define outcome Y, treatment T, and feature set X, including how address text is represented (e.g., geocoding + neighborhood attributes, an address text embedding).
Write the orthogonalized moment you will use for DML and describe sample-splitting and K-fold cross-fitting.
Choose nuisance learners for E[Y|X] and E[T|X], with hyperparameters, and explain how you prevent leakage from post-treatment variables.
Provide diagnostics for overlap/positivity and how you would trim or reweight.
Describe sensitivity analyses for unobserved confounding (e.g., Oster δ or partial R²) and how you will report subgroup effects (device, channel) while controlling FDR.
Explain how you will validate text features (ablation tests, SHAP stability across folds) and mitigate geographic privacy/fairness risks (e.g., excluding protected proxies, coarse geohashes).

Quick Overview

Context

Requirements

Define outcome Y, treatment T, and feature set X, including how address text is represented (e.g., geocoding + neighborhood attributes, an address text embedding).

Write the orthogonalized moment you will use for DML and describe sample-splitting and K-fold cross-fitting.

Choose nuisance learners for E[Y|X] and E[T|X], with hyperparameters, and explain how you prevent leakage from post-treatment variables.

Provide diagnostics for overlap/positivity and how you would trim or reweight.

Describe sensitivity analyses for unobserved confounding (e.g., Oster δ or partial R²) and how you will report subgroup effects (device, channel) while controlling FDR.

Explain how you will validate text features (ablation tests, SHAP stability across folds) and mitigate geographic privacy/fairness risks (e.g., excluding protected proxies, coarse geohashes).

Apply Double ML with text-address features

Quick Overview

Apply Double ML with text-address features

Estimate the ATE of a First Reminder on CSAT via Double Machine Learning (DML)

Context

Requirements

Write your answer

Apply Double ML with text-address features

Quick Overview

Apply Double ML with text-address features

Estimate the ATE of a First Reminder on CSAT via Double Machine Learning (DML)

Context

Requirements

Write your answer