How do I practice coding and algorithm questions?

Use PracHub's coding console to write, test, and debug your solutions in Python or JavaScript. View hints, test against sample inputs, and compare with official solutions.

What difficulty level is this coding question?

This is a medium difficulty Coding & Algorithms question, commonly asked during Technical Screen rounds at Amazon.

What role is this question designed for?

This question is commonly asked for Software Engineer candidates at Amazon during technical interviews.

Find Valid IP Addresses in Files | Amazon Coding Question

Quick Overview

This question evaluates file system traversal, recursive I/O handling, text parsing and pattern recognition, input validation for IPv4 formatting, deduplication, and ordered output; it is in the Coding & Algorithms category and emphasizes practical application over purely conceptual understanding.

Find Valid IP Addresses in Files

Company: Amazon

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: medium

Interview Round: Technical Screen

You are given an absolute or relative path to a directory on the local file system. Write a program that recursively traverses every subdirectory under that root directory, reads every regular file, finds all valid IPv4 addresses contained in those files, and prints the addresses in lexicographic order. A valid IPv4 address has exactly four decimal octets separated by dots. Each octet must be an integer from `0` to `255`. Leading zeros are not allowed unless the octet is exactly `0`. For example, `192.168.0.1` and `0.0.0.0` are valid, while `256.1.1.1`, `01.2.3.4`, `1.2.3`, and `1.2.3.4.5` are invalid. Clarifications: - Recursively visit all nested folders under the input path. - Only scan regular files. - Treat file contents as text; if a file cannot be decoded or read, handle it gracefully according to your language's best practices. - If the same valid IP address appears multiple times, print it only once. - The final output should contain one IP address per line, sorted lexicographically as strings.

Quick Answer: This question evaluates file system traversal, recursive I/O handling, text parsing and pattern recognition, input validation for IPv4 formatting, deduplication, and ordered output; it is in the Coding & Algorithms category and emphasizes practical application over purely conceptual understanding.

This problem is based on scanning a real directory tree. To make it deterministic for an online judge, the file system is provided as a nested dictionary named `filesystem`: a value that is a `dict` is a subdirectory, a value that is a `str` is a text file, and a value that is `bytes` or `bytearray` is a file that should be decoded as UTF-8 and skipped if decoding fails. Traverse every subdirectory, inspect every regular file, find all unique valid IPv4 addresses in the file contents, and return them sorted lexicographically as strings. A valid IPv4 address has exactly four decimal octets separated by dots; each octet must be between `0` and `255`, and leading zeros are not allowed unless the octet is exactly `0`.

Constraints

0 <= total number of files and folders <= 10^5
The total length of all decodable file contents is at most 10^6 characters
Only leaf values of type `str`, `bytes`, or `bytearray` should be treated as regular files

Examples

Input: ({'readme.txt': 'Main IPs: 10.0.0.1 and 192.168.0.1. Invalid: 256.1.1.1, 192.168.001.1.', 'logs': {'today.log': b'Seen 10.0.0.1 twice and 0.0.0.0 once', 'bad.bin': b'\xff\xfe\xfa'}},)

Expected Output: ['0.0.0.0', '10.0.0.1', '192.168.0.1']

Explanation: The valid unique addresses are 10.0.0.1, 192.168.0.1, and 0.0.0.0. The binary file bad.bin is skipped because it is not valid UTF-8. Addresses with 256 or leading zeros are invalid.

Input: ({},)

Expected Output: []

Explanation: An empty filesystem contains no files, so no IP addresses are found.

Input: ({'a.txt': 'Not valid: 999.1.1.1 1.2.3 1.2.3.4.5 01.2.3.4', 'sub': {'b.bin': b'\xff'}},)

Expected Output: []

Explanation: 999.1.1.1 is out of range, 1.2.3 is incomplete, 1.2.3.4.5 is part of a longer dot-separated numeric sequence, 01.2.3.4 has a leading zero, and b.bin is skipped due to UTF-8 decode failure.

Input: ({'x.txt': 'Valid: 1.2.3.4, 255.255.255.255, 0.10.20.30. Invalid: 00.0.0.0, 1.02.3.4', 'nested': {'y.dat': b'Also 1.2.3.4 and 0.0.0.0'}},)

Expected Output: ['0.0.0.0', '0.10.20.30', '1.2.3.4', '255.255.255.255']

Explanation: The valid unique addresses are collected across both files. Entries with leading zeros like 00.0.0.0 and 1.02.3.4 are rejected.

Hints

Use a stack or recursion to traverse the nested directory structure so you visit every subdirectory exactly once.
A regex can help you find dotted-number candidates, but you still need to validate each octet and reject leading zeros like `01`.

Quick Overview