Design distributed word count without MapReduce

Q: Design distributed word count without MapReduce

This is a System Design interview question from Adobe for Software Engineer roles. View the full question and solution on PracHub.

Q: How do I approach System Design interview questions?

System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master system design interviews.

Question

System Design: Distributed Word-Frequency Computation (No MapReduce)

Context

Design a distributed system that computes word frequencies over terabytes of text data without using MapReduce. Your design should support both batch (historical backfills) and streaming (near real-time) modes and scale horizontally.

Requirements

Describe and justify your approach for the following:

Ingestion
- Batch input from large files in object storage.
- Streaming input from event/log pipelines.
Partitioning & Routing
- Hash-by-token with consistent hashing and virtual nodes.
- Handling hot keys and data skew (e.g., stop-word mitigation, heavy-hitter routing).
Intermediate Storage
- Durable buffering between stages; topic/queue design; compaction vs retention.
Aggregation Topology
- Combiner layer, key ownership, state storage (e.g., local state stores), and how updates flow.
Processing Semantics
- Exactly-once or at-least-once guarantees; idempotency strategy.
Fault Tolerance & Replay
- Checkpointing state and offsets, recovery, reprocessing.
Backpressure & Flow Control
- How slow consumers protect the system; throttling strategies.
Horizontal Scaling & Rebalancing
- Adding/removing nodes; consistent hashing and state migration.
Monitoring & Operations
- Metrics, alerts, tracing, skew detection, capacity signals.
SLA/SLO Considerations
- Latency/throughput targets for streaming; completion targets for batch; error budgets.
Batch vs Streaming Trade-offs
- Contrast the two approaches and justify your design choices.

Design distributed word count without MapReduce

System Design: Distributed Word-Frequency Computation (No MapReduce)

Context

Requirements

Solution (Locked)

Comments (0)