Answer the following ML system design questions. State assumptions, propose an architecture, and discuss scaling, latency, and reliability.
1) Global device request detection (streaming)
An internal platform receives IT requests from many device types across the world.
-
Data volume is very large.
-
Events update continuously.
-
Timestamps are precise to
milliseconds
.
Design a system that can quickly detect “where requests are happening” (e.g., by region/site/device type) in near real-time.
Cover:
-
Ingestion, partitioning/sharding, storage
-
Stream processing and aggregations (time windows)
-
Query/serving layer (dashboards/alerts)
-
Handling out-of-order events, duplicates, clock skew
-
Reliability and SLOs
2) Fast labeling under extreme class imbalance
You have a very large dataset where the positive class is extremely rare (highly imbalanced). You need to label examples quickly to build a model.
Design an end-to-end labeling strategy/pipeline. Cover:
-
Sampling strategy to find positives
-
Human-in-the-loop workflow
-
Weak supervision / heuristics
-
Active learning
-
How you measure progress and prevent bias