Find Valid IP Addresses in Files
Company: Amazon
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Technical Screen
Quick Answer: This question evaluates file system traversal, recursive I/O handling, text parsing and pattern recognition, input validation for IPv4 formatting, deduplication, and ordered output; it is in the Coding & Algorithms category and emphasizes practical application over purely conceptual understanding.
Constraints
- 0 <= total number of files and folders <= 10^5
- The total length of all decodable file contents is at most 10^6 characters
- Only leaf values of type `str`, `bytes`, or `bytearray` should be treated as regular files
Examples
Input: ({'readme.txt': 'Main IPs: 10.0.0.1 and 192.168.0.1. Invalid: 256.1.1.1, 192.168.001.1.', 'logs': {'today.log': b'Seen 10.0.0.1 twice and 0.0.0.0 once', 'bad.bin': b'\xff\xfe\xfa'}},)
Expected Output: ['0.0.0.0', '10.0.0.1', '192.168.0.1']
Explanation: The valid unique addresses are 10.0.0.1, 192.168.0.1, and 0.0.0.0. The binary file bad.bin is skipped because it is not valid UTF-8. Addresses with 256 or leading zeros are invalid.
Input: ({},)
Expected Output: []
Explanation: An empty filesystem contains no files, so no IP addresses are found.
Input: ({'a.txt': 'Not valid: 999.1.1.1 1.2.3 1.2.3.4.5 01.2.3.4', 'sub': {'b.bin': b'\xff'}},)
Expected Output: []
Explanation: 999.1.1.1 is out of range, 1.2.3 is incomplete, 1.2.3.4.5 is part of a longer dot-separated numeric sequence, 01.2.3.4 has a leading zero, and b.bin is skipped due to UTF-8 decode failure.
Input: ({'x.txt': 'Valid: 1.2.3.4, 255.255.255.255, 0.10.20.30. Invalid: 00.0.0.0, 1.02.3.4', 'nested': {'y.dat': b'Also 1.2.3.4 and 0.0.0.0'}},)
Expected Output: ['0.0.0.0', '0.10.20.30', '1.2.3.4', '255.255.255.255']
Explanation: The valid unique addresses are collected across both files. Entries with leading zeros like 00.0.0.0 and 1.02.3.4 are rejected.
Hints
- Use a stack or recursion to traverse the nested directory structure so you visit every subdirectory exactly once.
- A regex can help you find dotted-number candidates, but you still need to validate each octet and reject leading zeros like `01`.