How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

What difficulty level is this interview question?

This is a medium difficulty Machine Learning question, commonly asked during Onsite rounds at Reddit.

What role is this question designed for?

This question is commonly asked for Machine Learning Engineer candidates at Reddit during technical interviews.

Analyze CTR Data and Train Model | Reddit Interview Question

Quick Overview

This question evaluates end-to-end supervised machine learning competencies for tabular click-through rate prediction, including exploratory data analysis, data quality and leakage assessment, categorical feature handling, model training, baseline comparison, and evaluation interpretation.

You are given a notebook-based live coding task for click prediction.

A tabular dataset contains the following columns:

post_id : unique identifier for a post
cat_hour , dog_hour , rabbit_hour : numeric features representing recent hourly activity signals
current_tag : categorical tag for the current post
is_click : binary label indicating whether the post was clicked

Your task is to:

Perform exploratory data analysis on the dataset.
Identify important data quality issues, feature distributions, correlations, and potential leakage risks.
Prepare the data for modeling, including handling categorical features and deciding what to do with post_id .
Train a model to predict click probability (CTR) for each row.
Compare a simple baseline model with at least one stronger model.
Explain your evaluation strategy, model selection reasoning, and how you would interpret the results.

Assume this is an interview setting, so clarity of reasoning, sensible tradeoffs, and a clean end-to-end workflow matter as much as final model performance.

Analyze CTR Data and Train Model

Quick Overview

Solution

Comments (0)