PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep

Quick Overview

This question evaluates data cleaning, type coercion, robust merge operations, and missing-data imputation skills within tabular datasets, emphasizing pandas-based DataFrame manipulation and integration of auxiliary zipcode features.

  • easy
  • Capital One
  • Data Manipulation (SQL/Python)
  • Data Scientist

Clean and Merge Housing Data

Company: Capital One

Role: Data Scientist

Category: Data Manipulation (SQL/Python)

Difficulty: easy

Interview Round: Onsite

You are given two pandas DataFrames from a house-price screening exercise. `house_sales_raw` - `listing_id` INT - `sale_price` FLOAT, nullable in rows reserved for scoring - `sqft_living` STRING - `lot_size` STRING - `bedrooms` STRING - `bathrooms` STRING - `year_built` STRING - `zipcode` STRING - `house_type` STRING - `garage_spaces` STRING, nullable - `condition_score` STRING, nullable `zipcode_features` - `zipcode` STRING - `median_income` FLOAT - `school_score` FLOAT Relationship: - `house_sales_raw.zipcode = zipcode_features.zipcode` Tasks: 1. Convert numeric-like string columns to the correct pandas numeric dtype. Invalid parses should become null rather than causing failures. 2. Merge `house_sales_raw` with `zipcode_features` on `zipcode`. 3. Fill missing values using simple imputation rules: median for numeric columns, mode for categorical columns, and create explicit indicator columns for imputed fields where appropriate. 4. Return a cleaned DataFrame with one row per `listing_id` and the following output columns: `listing_id, sale_price, sqft_living, lot_size, bedrooms, bathrooms, year_built, zipcode, house_type, garage_spaces, condition_score, median_income, school_score`. You may assume the exercise is purely tabular and no timezone handling is needed.

Quick Answer: This question evaluates data cleaning, type coercion, robust merge operations, and missing-data imputation skills within tabular datasets, emphasizing pandas-based DataFrame manipulation and integration of auxiliary zipcode features.

Last updated: May 7, 2026

Loading coding console...

PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Find Lowest Prices for Highly Rated Categories - Capital One (medium)
  • Write SQL to compute campaign net revenue - Capital One (Medium)
  • Merge CSVs and build revenue pivot with pandas - Capital One (Medium)
  • Find top category per region in Aug 2025 - Capital One (Medium)
  • Reconcile ledgers with SQL/Python and late events - Capital One (Medium)