Cross-reference logs to flag spam numbers
Company: Pinterest
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: Medium
Interview Round: Onsite
Quick Answer: This question evaluates proficiency with data structures (such as hash maps), parsing and deduplication, frequency counting, input validation, and analysis of time and space complexity.
Constraints
- 0 <= len(call_log) <= 100000
- 0 <= len(reports) <= 100000
- Each raw phone number string has length at most 100
- Only spaces, hyphens, and parentheses are allowed as formatting characters
- After removing formatting characters, a valid phone number must contain exactly 10 digits
- Duplicate reports count multiple times
- Duplicate call log entries should not create duplicate output rows
Examples
Input: (['555-123-4567', '(212) 555-0000', '5551234567', '999-999-9999'], ['5551234567', '555-123-4567', '2125550000', '0000000000'])
Expected Output: [['5551234567', 2], ['2125550000', 1]]
Explanation: 5551234567 appears in the call log and is reported twice. 2125550000 appears in the call log and is reported once. 0000000000 was reported but does not appear in the call log.
Input: (['123-45-6789', 'abc-555-1234', '1112223333', '(111) 222-3333'], ['111-222-3333', '1112223333x', '1112223333', '22233344444', '2223334444'])
Expected Output: [['1112223333', 2]]
Explanation: 123-45-6789 has only 9 digits and abc-555-1234 contains invalid letters, so they are ignored. 1112223333 appears in the call log and has two valid reports. Entries with an x or 11 digits are invalid.
Input: ([], ['5551234567', '5551234567'])
Expected Output: []
Explanation: There are valid spam reports, but the call log is empty, so no reported number can be cross-referenced.
Input: (['5551234567'], [])
Expected Output: []
Explanation: The phone number appears in the call log, but there are no spam reports.
Input: (['12', 'not a number', '555#123#4567'], ['5551234567', 'bad'])
Expected Output: []
Explanation: All call log entries are invalid, so nothing is returned even though one report is valid.
Input: (['3333333333', '2222222222', '333-333-3333', '1111111111'], ['1111111111', '2222222222', '222-222-2222', '3333333333', '3333333333', '4444444444'])
Expected Output: [['3333333333', 2], ['2222222222', 2], ['1111111111', 1]]
Explanation: Results follow the first valid occurrence order in the call log. Duplicate call log entries do not create duplicate output rows, while duplicate valid reports increase the count.
Hints
- Normalize both datasets into the same canonical 10-digit representation before comparing them.
- Use a hash map to count valid spam reports, then scan the call log while tracking which valid numbers have already been output.