PracHub
QuestionsCoachesLearningGuidesInterview Prep
|Home/Machine Learning/Two Sigma

Perform no-intercept linear regression from two datasets

Last updated: Mar 29, 2026

Quick Overview

This question evaluates linear regression modeling, numerical linear algebra (normal equations and stable factorization methods), regularization, R² interpretation, and data-preprocessing/validation skills within the Machine Learning domain for a Data Scientist role.

  • Medium
  • Two Sigma
  • Machine Learning
  • Data Scientist

Perform no-intercept linear regression from two datasets

Company: Two Sigma

Role: Data Scientist

Category: Machine Learning

Difficulty: Medium

Interview Round: Take-home Project

You are given two pandas datasets to fit an OLS model without an intercept (through origin). Dataset A (features): df_X(user_id, clicks, impressions). Dataset B (target): df_y(user_id, conversions). Tiny samples: df_X A | 10 | 100 B | 5 | 40 C | 0 | 20 (df_X columns: user_id, clicks, impressions) df_y A | 4 B | 2 C | 1 (df_y columns: user_id, conversions) Tasks: 1) Inner-join on user_id to build X ∈ R^{n×p} with columns [clicks, impressions] and y ∈ R^{n}. Fit the no-intercept model y = Xβ via: (i) normal equations β̂ = (XᵀX)^{-1}Xᵀy; and (ii) a numerically stable factorization (QR or SVD). Show the intermediate matrices (XᵀX, Xᵀy) and the final β̂. 2) Compute R² for the no-intercept model using the correct definition (with TSS = ∑ y_i^2). Explain why this R² can be negative and compare against the standard with an intercept. 3) Discuss when omitting the intercept is appropriate; verify by mean-centering X and y and refitting both with- and without-intercept, commenting on coefficient changes. 4) If XᵀX is singular or ill-conditioned (e.g., clicks collinear with impressions), detect this and refit using ridge regression with λ = 1e−3; give the closed-form β̂_ridge = (XᵀX + λI)^{-1}Xᵀy and compute it on the sample. 5) State how you would validate the model (residual diagnostics, cross-validation) and guard against data leakage when constructing df_X and df_y.

Quick Answer: This question evaluates linear regression modeling, numerical linear algebra (normal equations and stable factorization methods), regularization, R² interpretation, and data-preprocessing/validation skills within the Machine Learning domain for a Data Scientist role.

Related Interview Questions

  • Analyze Temperatures and Update Regression - Two Sigma (medium)
  • How would you forecast bike demand? - Two Sigma (hard)
  • Predict Bike Dock Demand - Two Sigma (hard)
  • Predict bike demand and avoid overfitting - Two Sigma (hard)
  • How detect duplicate card records? - Two Sigma (medium)
|Home/Machine Learning/Two Sigma

Perform no-intercept linear regression from two datasets

Two Sigma logo
Two Sigma
Oct 13, 2025, 9:49 PM
MediumData ScientistTake-home ProjectMachine Learning
12
0

You are given two pandas datasets to fit an OLS model without an intercept (through origin). Dataset A (features): df_X(user_id, clicks, impressions). Dataset B (target): df_y(user_id, conversions). Tiny samples:

df_X A | 10 | 100 B | 5 | 40 C | 0 | 20

(df_X columns: user_id, clicks, impressions)

df_y A | 4 B | 2 C | 1

(df_y columns: user_id, conversions)

Tasks:

  1. Inner-join on user_id to build X ∈ R^{n×p} with columns [clicks, impressions] and y ∈ R^{n}. Fit the no-intercept model y = Xβ via: (i) normal equations β̂ = (XᵀX)^{-1}Xᵀy; and (ii) a numerically stable factorization (QR or SVD). Show the intermediate matrices (XᵀX, Xᵀy) and the final β̂. 2) Compute R² for the no-intercept model using the correct definition (with TSS = ∑ y_i^2). Explain why this R² can be negative and compare against the standard with an intercept. 3) Discuss when omitting the intercept is appropriate; verify by mean-centering X and y and refitting both with- and without-intercept, commenting on coefficient changes. 4) If XᵀX is singular or ill-conditioned (e.g., clicks collinear with impressions), detect this and refit using ridge regression with λ = 1e−3; give the closed-form β̂_ridge = (XᵀX + λI)^{-1}Xᵀy and compute it on the sample. 5) State how you would validate the model (residual diagnostics, cross-validation) and guard against data leakage when constructing df_X and df_y.
Loading comments...

Browse More Questions

More Machine Learning•More Two Sigma•More Data Scientist•Two Sigma Data Scientist•Two Sigma Machine Learning•Data Scientist Machine Learning

Write your answer

Your first approved answer each day earns 20 XP.

Sign in to write your answer.
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • AI Coding Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.