Build a model to infer home vs office vs public

Q: Build a model to infer home vs office vs public

This is a Machine Learning interview question from Meta for Data Scientist roles. View the full question and solution on PracHub.

Q: How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

Question

You must infer whether a Facebook session’s network context is home, office, or public venue to inform Portal targeting. Constraints: IPs may be shared (NAT), dynamic, or CGNAT; households have multiple users; only privacy‑preserving telemetry is allowed (timestamps, coarse geolocation, ASN/ISP, device/app vs web, session lengths, concurrent sessions, contact‑graph features). Today is 2025-09-01. Build an ML approach:

Features: propose robust, leak‑free features capturing diurnal/weekly patterns, ISP/ASN type (residential vs enterprise vs mobile), IP stability, geolocation drift, concurrent user counts on the same IP, session inter‑arrival, device/browser/OS mix, reverse DNS hints, and calling‑graph closeness (e.g., kin vs coworker patterns). Explain how to handle apartments sharing a router and coffee‑shop Wi‑Fi.
Labels: design weak‑supervision strategies to obtain labels at scale (e.g., overnight dwell heuristics, business‑hours rules, known corporate ASNs, opted‑in seed users, store‑IP blacklists). Describe how you will de‑bias noisy labels.
Modeling: compare baseline rule lists vs gradient‑boosted trees vs sequence models (e.g., per‑IP HMM or transformer over events). Consider multi‑instance learning to aggregate session‑level predictions to user/household. Explain calibration and thresholding for asymmetric costs (misclassifying office as home).
Evaluation: define metrics (macro F1, expected cost), cross‑geo temporal CV, and backtests across holidays. Prevent leakage from future behavior and from using Portal adoption as a proxy. Quantify uncertainty.
Privacy/compliance: specify minimization, aggregation, retention, on‑device inference options, and red‑teaming for re‑identification risks.
Deployment: outline real‑time vs batch inference, drift monitoring, and a holdout plan to measure whether location‑type targeting improves conversion.

Build a model to infer home vs office vs public

Comments (0)