PracHub
QuestionsPremiumLearningGuidesInterview PrepNEWCoaches
|Home/System Design/Adobe

Design distributed word count without MapReduce

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a candidate's competency in designing scalable distributed systems and large-scale data processing pipelines, focusing on ingestion, partitioning/sharding, partial aggregation and merging, global top‑K computation, fault tolerance, idempotency, and storage/serving concerns.

  • hard
  • Adobe
  • System Design
  • Software Engineer

Design distributed word count without MapReduce

Company: Adobe

Role: Software Engineer

Category: System Design

Difficulty: hard

Interview Round: Technical Screen

Design a distributed system to compute word frequencies over terabytes of text without using MapReduce. Specify how you will ingest data (e.g., log or stream service), partition tokens across shards (e.g., consistent hashing with salting to mitigate hot keys), aggregate partial counts, and produce global results and top‑K across shards. Describe fault tolerance, idempotency, exactly‑once vs. at‑least‑once semantics, backpressure, recovery from node failures, and how results are stored and served.

Quick Answer: This question evaluates a candidate's competency in designing scalable distributed systems and large-scale data processing pipelines, focusing on ingestion, partitioning/sharding, partial aggregation and merging, global top‑K computation, fault tolerance, idempotency, and storage/serving concerns.

Related Interview Questions

  • Analyze end-to-end request latency - Adobe (hard)
  • Analyze web request latency causes - Adobe (medium)
Adobe logo
Adobe
Sep 6, 2025, 12:00 AM
Software Engineer
Technical Screen
System Design
5
0

System Design: Distributed Word Frequency Counting (No MapReduce)

Context

You need to design a distributed system that computes word frequencies over terabytes of text data. The system must not use MapReduce but should still scale horizontally and produce both global counts and top‑K words. Assume the data can arrive as batch files or continuous streams.

Requirements

Describe an end‑to‑end design that covers:

  1. Ingestion
    • How raw text enters the system (e.g., log/stream service).
    • Chunking, schema, and ordering assumptions.
  2. Partitioning/sharding
    • How tokens are partitioned across shards (e.g., consistent hashing).
    • Mitigating hot keys (e.g., salting/splitting heavy tokens).
  3. Aggregation
    • How partial counts are produced and aggregated.
    • How to combine salted partitions to produce global per‑word counts.
  4. Global results and top‑K
    • How to compute exact global counts and global top‑K across shards.
    • Clarify if approximation is acceptable and which algorithm you’d use.
  5. Reliability and correctness
    • Fault tolerance and recovery from node failures.
    • Idempotency and handling retries.
    • Exactly‑once vs at‑least‑once delivery semantics (trade‑offs and how you’d achieve each).
    • Backpressure and flow control.
  6. Storage and serving
    • Where results are stored (per‑word counts, snapshots, metadata).
    • How they are exposed (APIs, latency, consistency expectations).

Make reasonable assumptions explicit. Provide diagrams verbally if helpful and include any small examples to clarify top‑K and salted key combination.

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More System Design•More Adobe•More Software Engineer•Adobe Software Engineer•Adobe System Design•Software Engineer System Design
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.