You must infer whether a Facebook session’s network context is home, office, or public venue to inform Portal targeting. Constraints: IPs may be shared (NAT), dynamic, or CGNAT; households have multiple users; only privacy‑preserving telemetry is allowed (timestamps, coarse geolocation, ASN/ISP, device/app vs web, session lengths, concurrent sessions, contact‑graph features). Today is 2025-09-01. Build an ML approach:
-
Features: propose robust, leak‑free features capturing diurnal/weekly patterns, ISP/ASN type (residential vs enterprise vs mobile), IP stability, geolocation drift, concurrent user counts on the same IP, session inter‑arrival, device/browser/OS mix, reverse DNS hints, and calling‑graph closeness (e.g., kin vs coworker patterns). Explain how to handle apartments sharing a router and coffee‑shop Wi‑Fi.
-
Labels: design weak‑supervision strategies to obtain labels at scale (e.g., overnight dwell heuristics, business‑hours rules, known corporate ASNs, opted‑in seed users, store‑IP blacklists). Describe how you will de‑bias noisy labels.
-
Modeling: compare baseline rule lists vs gradient‑boosted trees vs sequence models (e.g., per‑IP HMM or transformer over events). Consider multi‑instance learning to aggregate session‑level predictions to user/household. Explain calibration and thresholding for asymmetric costs (misclassifying office as home).
-
Evaluation: define metrics (macro F1, expected cost), cross‑geo temporal CV, and backtests across holidays. Prevent leakage from future behavior and from using Portal adoption as a proxy. Quantify uncertainty.
-
Privacy/compliance: specify minimization, aggregation, retention, on‑device inference options, and red‑teaming for re‑identification risks.
-
Deployment: outline real‑time vs batch inference, drift monitoring, and a holdout plan to measure whether location‑type targeting improves conversion.