Find Top Hashtags
Company: F5Networks
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Take-home Project
Quick Answer: This question evaluates string parsing and tokenization, frequency counting of elements, time-window filtering of events, and the ability to sort results with deterministic tie-breaking rules.
Constraints
- 1 <= n <= 10^5 (tweets and timestamps have equal length)
- Each tweet is a whitespace-separated string of tokens
- A hashtag is a token starting with '#' followed by at least one character
- 0 <= timestamps[i], currentTime <= 10^9
- 0 <= timeWindow <= currentTime
- Window bounds are inclusive: [currentTime - timeWindow, currentTime]
Examples
Input: (['Loving #python and #coding', 'I love #python', '#python rocks but #java too', '#old tweet'], [100, 90, 80, 10], 100, 30)
Expected Output: ['#python', '#coding', '#java']
Explanation: Window is [70, 100]; the tweet at ts=10 is excluded. #python appears 3 times, #coding once, #java once. Ties (#coding vs #java) broken lexicographically: '#coding' < '#java'.
Input: (['#a #b #c #d #e', '#a #b', '#a'], [50, 50, 50], 50, 0)
Expected Output: ['#a', '#b', '#c']
Explanation: Window [50,50] includes all three tweets. #a=3, #b=2, #c=1 (#d, #e each 1). Top 3 by count then name: #a, #b, #c.
Input: (['#tie1 #tie2', '#tie1 #tie2'], [5, 5], 10, 10)
Expected Output: ['#tie1', '#tie2']
Explanation: Both hashtags appear twice; only 2 distinct hashtags so all are returned, ordered lexicographically since counts tie.
Input: ([], [], 100, 50)
Expected Output: []
Explanation: No tweets, so no hashtags.
Input: (['no hashtags here', 'just text'], [5, 6], 10, 10)
Expected Output: []
Explanation: In-window tweets contain no tokens starting with '#'.
Input: (['#late', '#intime'], [200, 50], 100, 60)
Expected Output: ['#intime']
Explanation: Window [40,100]; ts=200 is excluded, ts=50 is included. Only #intime survives.
Input: (['#z #z #y #y #x', '#w'], [5, 5], 5, 5)
Expected Output: ['#y', '#z', '#w']
Explanation: Counts: #z=2, #y=2, #x=1, #w=1. Sorted by count desc then name asc: #y and #z (both 2, #y<#z), then #w (#w<#x) fills the third slot. #x is dropped by the at-most-3 cap.
Hints
- First filter to the tweets whose timestamp lies in the inclusive window [currentTime - timeWindow, currentTime].
- Split each surviving tweet on whitespace and keep only tokens that start with '#' (with something after it); tally them in a hash map.
- Sort the (hashtag, count) pairs by count descending, then by hashtag ascending, and take the first 3. Tie-breaking on the hashtag string is what distinguishes equal-count tags.