How do I practice coding and algorithm questions?

Use PracHub's coding console to write, test, and debug your solutions in Python or JavaScript. View hints, test against sample inputs, and compare with official solutions.

What difficulty level is this coding question?

This is a medium difficulty Coding & Algorithms question, commonly asked during Technical Screen rounds at Adobe.

What role is this question designed for?

This question is commonly asked for Software Engineer candidates at Adobe during technical interviews.

Implement file word count | Adobe Coding Question

Quick Overview

This question evaluates a candidate's skills in text processing, large-scale I/O, token normalization, memory-constrained computation and algorithmic trade-offs for frequency aggregation, and it falls under the Coding & Algorithms domain because it combines parsing, data structures and external processing concerns.

Implement file word count

Company: Adobe

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: medium

Interview Round: Technical Screen

Write a function that reads a large text file and returns the frequency count of each word. Define how you will normalize tokens (case, Unicode, punctuation, contractions), handle memory limits (streaming, chunking), and output the top-k most frequent words efficiently. Analyze time and space complexity and discuss trade-offs between hash maps, tries, and external sorting when the file barely fits in memory.

Quick Answer: This question evaluates a candidate's skills in text processing, large-scale I/O, token normalization, memory-constrained computation and algorithmic trade-offs for frequency aggregation, and it falls under the Coding & Algorithms domain because it combines parsing, data structures and external processing concerns.

A very large text file is provided as a list of text chunks in reading order to simulate streaming file reads. Write a function `solution(chunks, k)` that returns the top-`k` most frequent words in the file without concatenating the entire file into one giant string. Normalize tokens using these rules: 1. Apply Unicode normalization with NFKC. 2. Compare words case-insensitively using Unicode `casefold()`. 3. A word consists of Unicode letters and digits, and may contain apostrophes only when the apostrophe is inside the word (for example, `don't` stays one word, but `'hello'` becomes `hello`, and `believin'` becomes `believin`). Treat common Unicode apostrophes like `’` as `'`. 4. Any other character is a separator. 5. Words may be split across chunk boundaries, so your parser must preserve partial tokens between chunks. Return the result as a list of `(word, count)` tuples sorted by descending frequency, then ascending lexicographical word order for ties. If `k` is 0 or there are no words, return an empty list. For the coding portion, implement the exact in-memory approach using a hash map for counts and an efficient top-`k` extraction strategy. In discussion, candidates should be able to compare this approach with tries and external sorting when the distinct-word set barely fits in memory.

Constraints

0 <= len(chunks) <= 100000
0 <= total number of characters across all chunks <= 10^7
0 <= k
Chunks must be processed in order, and words may span chunk boundaries

Examples

Input: (['Hello, world! HELLO... world?', 'hello'], 2)

Expected Output: [('hello', 3), ('world', 2)]

Explanation: After normalization, the words are hello, world, hello, world, hello. The top 2 are hello (3) and world (2).

Input: (["Don", "'t stop belie", "vin'!", " Don't, stop."], 3)

Expected Output: [("don't", 2), ('stop', 2), ('believin', 1)]

Explanation: The parser must join words across chunk boundaries. `Don't` appears twice, `stop` appears twice, and `believin'` is normalized to `believin` because the trailing apostrophe is not internal.

Input: ([], 5)

Expected Output: []

Explanation: An empty file contains no words.

Input: (['Straße stra', 'sse café CAFÉ 123 123', " O’Reilly o'reilly"], 4)

Expected Output: [('123', 2), ('café', 2), ("o'reilly", 2), ('strasse', 2)]

Explanation: `Straße` and `strasse` normalize to `strasse`, both café variants normalize to `café`, and both apostrophe styles normalize to `o'reilly`. All four words have frequency 2, so tie-breaking is lexicographical.

Input: (['One two two'], 0)

Expected Output: []

Explanation: If k is 0, the function should return no results.

Hints

Keep a current token and any pending apostrophes between chunks so a word split across two chunks is still counted correctly.
After building the frequency map, use a min-heap of size `k` to avoid sorting every distinct word when `k` is much smaller than the number of unique words.

Quick Overview