This question evaluates the ability to design scalable, low-latency ML systems for global streaming event detection and rapid labeling under extreme class imbalance, assessing competencies in stream ingestion and partitioning, time-windowed aggregations, serving/alerting layers, and end-to-end labeling pipelines.
Answer the following ML system design questions. State assumptions, propose an architecture, and discuss scaling, latency, and reliability.
An internal platform receives IT requests from many device types across the world.
Design a system that can quickly detect “where requests are happening” (e.g., by region/site/device type) in near real-time.
Cover:
You have a very large dataset where the positive class is extremely rare (highly imbalanced). You need to label examples quickly to build a model.
Design an end-to-end labeling strategy/pipeline. Cover: