Design device telemetry pipeline for real-time and batch
Company: Amazon
Role: Software Engineer
Category: System Design
Difficulty: medium
Interview Round: Take-home Project
Design a distributed system that ingests telemetry from **millions of devices** and supports both:
- **Real-time** analytics (near-real-time dashboards/alerts)
- **Batch** analytics (historical queries, aggregates, offline jobs)
Cover:
- Data ingestion (protocols, auth, buffering)
- Storage choices for hot vs cold data
- Stream processing vs batch processing
- Schema/partitioning strategy
- Scalability, reliability, and cost considerations
- A minimal set of APIs/queries you want to support
Quick Answer: This question evaluates a candidate's competency in distributed systems architecture, data engineering, and operational trade-offs involved in designing telemetry pipelines that support both near-real-time and batch analytics.