##### Scenario Experian DataLabs Data Scientist technical screen — a machine-learning deep-dive on the modelling choices used in your project, mixed with conceptual questions (some OA / multiple-choice style). ##### Question Walk through the core ML concepts behind a binary-classification project, covering preprocessing, modelling, optimization, and evaluation: 1. Explain how PCA achieves dimensionality reduction and why you would (or would not) apply L2 normalization before training. Distinguish per-column standardization from per-sample (row) L2 normalization, and say when each matters. 2. Derive the logistic-regression gradient via back-propagation, then generalize: describe how backpropagation works in modern multi-layer neural nets. 3. What baseline models did you compare against, and why did you ultimately choose logistic regression? 4. Define knowledge-informed machine learning and give a concrete example. 5. When and how would you move the classification threshold to improve FPR or TPR? Can you improve both FPR and TPR at the same time by moving a single threshold? ##### Hints Discuss eigenvectors/explained variance, maximum-likelihood gradients (prediction error × input), the chain rule through layers, ROC/PR curves and cost-sensitive thresholds, model-selection criteria, and domain priors/constraints.

This interview question evaluates core ML concepts, assumptions, math intuition, training/evaluation trade-offs, and practical failure modes in a realistic interview setting. A strong answer for Explain PCA and L2 Normalization in Machine Learning states assumptions, handles edge cases, explains trade-offs, and shows how to validate the result clearly.

How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

What difficulty level is this interview question?

This is a medium difficulty Machine Learning question, commonly asked during Technical Screen rounds at Experian.

What role is this question designed for?

This question is commonly asked for Data Scientist candidates at Experian during technical interviews.

Explain PCA and L2 Normalization in Machine Learning

Scenario

Experian DataLabs Data Scientist technical screen — a machine-learning deep-dive on the modelling choices used in your project, mixed with conceptual questions (some OA / multiple-choice style).

Question

Walk through the core ML concepts behind a binary-classification project, covering preprocessing, modelling, optimization, and evaluation:

Explain how PCA achieves dimensionality reduction and why you would (or would not) apply L2 normalization before training. Distinguish per-column standardization from per-sample (row) L2 normalization, and say when each matters.
Derive the logistic-regression gradient via back-propagation, then generalize: describe how backpropagation works in modern multi-layer neural nets.
What baseline models did you compare against, and why did you ultimately choose logistic regression?
Define knowledge-informed machine learning and give a concrete example.
When and how would you move the classification threshold to improve FPR or TPR? Can you improve both FPR and TPR at the same time by moving a single threshold?

Hints

Discuss eigenvectors/explained variance, maximum-likelihood gradients (prediction error × input), the chain rule through layers, ROC/PR curves and cost-sensitive thresholds, model-selection criteria, and domain priors/constraints.

Constraints & Assumptions

Preserve the scope, facts, inputs, and requested outputs from the prompt above.
If the prompt leaves a detail unspecified, state a reasonable assumption before relying on it.
Keep the answer interview-ready: concise enough to present, but concrete enough to implement or evaluate.

Clarifying Questions to Ask

Clarify the task, data shape, labels, constraints, and evaluation metric.
State assumptions behind the math or modeling technique you choose.
Connect theory to practical training, debugging, and deployment implications.

What a Strong Answer Covers

Correct definitions and formulas where the prompt requires them.
A practical explanation of how the method behaves on real data.
Trade-offs, failure modes, diagnostics, and mitigation strategies.
Evaluation choices that match the product or modeling objective.

Follow-up Questions

How would noisy labels, class imbalance, or distribution shift affect the answer?
What would you monitor after deployment?
Which baseline would you compare against first?