Collect labels without existing data

Q: Collect labels without existing data

This is a Analytics & Experimentation interview question from Shopify for Machine Learning Engineer roles. View the full question and solution on PracHub.

Q: How do I approach Analytics & Experimentation interview questions?

Analytics & Experimentation questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master analytics & experimentation interviews.

Question

Modeling Without Labels: End-to-End Plan

You are tasked with shipping an ML model but have no labeled data. Outline a rigorous approach to:

Define the label and guard against leakage.
Collect or create labels ethically and at scale.
Validate label quality and maintain it over time.

Discuss the following components concretely:

Instrumentation and logging schemas: event taxonomy, schema/versioning, user/session IDs, consent/PII handling, feature–label joins, time horizons.
Heuristic/weak supervision and programmatic labeling: labeling functions, noise-aware aggregation, calibration.
Human-in-the-loop pipelines: active learning, rater training, QA, throughput, costs.
Proxy labels: when to use, known biases, calibration to true outcomes.
Controlled experiments or exploration to elicit outcomes: A/B tests or bandits to ethically gather ground truth with minimal regret.
Sampling strategies to reduce bias: stratification, reweighting, handling delayed feedback and censoring.
Gold sets and inter-rater agreement: creation, maintenance, and agreement statistics.
Continuous data quality monitoring: drift, label delay, schema contracts, alerts.

Provide a step-by-step plan, clear assumptions, and practical validation methods.

Collect labels without existing data

Modeling Without Labels: End-to-End Plan

Solution (Locked)

Comments (0)