Load and prepare JSON for modeling
Company: Reddit
Role: Machine Learning Engineer
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Technical Screen
Using Python in a Jupyter notebook, load a JSON dataset with fields:
(
1) hours spent reading A posts (float),
(
2) hours spent reading B posts (float),
(
3) hours spent reading C posts (float),
(
4) current post category (A/B/C), and
(
5) click (binary label). Convert it into a pandas DataFrame suitable for modeling: enforce correct data types, encode the categorical post category, validate the schema, and run checks confirming no missing values or class imbalance. Provide code that performs the load, transformation, and validation.
Quick Answer: This question evaluates competency in data preprocessing and validation for machine learning—enforcing correct data types, encoding categorical variables, detecting missing values, and assessing class balance.