Compute Top-K word frequencies under a path

Q: Compute Top-K word frequencies under a path

This is a Coding & Algorithms interview question from Box for Software Engineer roles. View the full question and solution on PracHub.

Q: How do I approach Coding & Algorithms interview questions?

Coding & Algorithms questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master coding & algorithms interviews.

Question

Given a filesystem path that may contain nested subdirectories and files, compute the top K most frequent words across all files. Describe an in-memory solution and its complexity. Follow-up: when the corpus is too large to fit in memory, propose scalable approaches (e.g., external sorting/partitioning, MapReduce-style sharding and merge) and an approximate heavy-hitters approach (e.g., Count–Min Sketch with Space-Saving), including accuracy/latency/storage trade-offs.

Compute Top-K word frequencies under a path

Comments (0)