Design distributed word count without MapReduce

Q: Design distributed word count without MapReduce

This is a System Design interview question from Adobe for Software Engineer roles. View the full question and solution on PracHub.

Q: How do I approach System Design interview questions?

System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master system design interviews.

Question

System Design: Distributed Word Frequency Counting (No MapReduce)

Context

You need to design a distributed system that computes word frequencies over terabytes of text data. The system must not use MapReduce but should still scale horizontally and produce both global counts and top‑K words. Assume the data can arrive as batch files or continuous streams.

Requirements

Describe an end‑to‑end design that covers:

Ingestion
- How raw text enters the system (e.g., log/stream service).
- Chunking, schema, and ordering assumptions.
Partitioning/sharding
- How tokens are partitioned across shards (e.g., consistent hashing).
- Mitigating hot keys (e.g., salting/splitting heavy tokens).
Aggregation
- How partial counts are produced and aggregated.
- How to combine salted partitions to produce global per‑word counts.
Global results and top‑K
- How to compute exact global counts and global top‑K across shards.
- Clarify if approximation is acceptable and which algorithm you’d use.
Reliability and correctness
- Fault tolerance and recovery from node failures.
- Idempotency and handling retries.
- Exactly‑once vs at‑least‑once delivery semantics (trade‑offs and how you’d achieve each).
- Backpressure and flow control.
Storage and serving
- Where results are stored (per‑word counts, snapshots, metadata).
- How they are exposed (APIs, latency, consistency expectations).

Make reasonable assumptions explicit. Provide diagrams verbally if helpful and include any small examples to clarify top‑K and salted key combination.

Design distributed word count without MapReduce

System Design: Distributed Word Frequency Counting (No MapReduce)

Context

Requirements

Solution (Locked)

Comments (0)