PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep

Quick Overview

This question evaluates file system traversal, recursive I/O handling, text parsing and pattern recognition, input validation for IPv4 formatting, deduplication, and ordered output; it is in the Coding & Algorithms category and emphasizes practical application over purely conceptual understanding.

  • medium
  • Amazon
  • Coding & Algorithms
  • Software Engineer

Find Valid IP Addresses in Files

Company: Amazon

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: medium

Interview Round: Technical Screen

You are given an absolute or relative path to a directory on the local file system. Write a program that recursively traverses every subdirectory under that root directory, reads every regular file, finds all valid IPv4 addresses contained in those files, and prints the addresses in lexicographic order. A valid IPv4 address has exactly four decimal octets separated by dots. Each octet must be an integer from `0` to `255`. Leading zeros are not allowed unless the octet is exactly `0`. For example, `192.168.0.1` and `0.0.0.0` are valid, while `256.1.1.1`, `01.2.3.4`, `1.2.3`, and `1.2.3.4.5` are invalid. Clarifications: - Recursively visit all nested folders under the input path. - Only scan regular files. - Treat file contents as text; if a file cannot be decoded or read, handle it gracefully according to your language's best practices. - If the same valid IP address appears multiple times, print it only once. - The final output should contain one IP address per line, sorted lexicographically as strings.

Quick Answer: This question evaluates file system traversal, recursive I/O handling, text parsing and pattern recognition, input validation for IPv4 formatting, deduplication, and ordered output; it is in the Coding & Algorithms category and emphasizes practical application over purely conceptual understanding.

This problem is based on scanning a real directory tree. To make it deterministic for an online judge, the file system is provided as a nested dictionary named `filesystem`: a value that is a `dict` is a subdirectory, a value that is a `str` is a text file, and a value that is `bytes` or `bytearray` is a file that should be decoded as UTF-8 and skipped if decoding fails. Traverse every subdirectory, inspect every regular file, find all unique valid IPv4 addresses in the file contents, and return them sorted lexicographically as strings. A valid IPv4 address has exactly four decimal octets separated by dots; each octet must be between `0` and `255`, and leading zeros are not allowed unless the octet is exactly `0`.

Constraints

  • 0 <= total number of files and folders <= 10^5
  • The total length of all decodable file contents is at most 10^6 characters
  • Only leaf values of type `str`, `bytes`, or `bytearray` should be treated as regular files

Examples

Input: ({'readme.txt': 'Main IPs: 10.0.0.1 and 192.168.0.1. Invalid: 256.1.1.1, 192.168.001.1.', 'logs': {'today.log': b'Seen 10.0.0.1 twice and 0.0.0.0 once', 'bad.bin': b'\xff\xfe\xfa'}},)

Expected Output: ['0.0.0.0', '10.0.0.1', '192.168.0.1']

Explanation: The valid unique addresses are 10.0.0.1, 192.168.0.1, and 0.0.0.0. The binary file bad.bin is skipped because it is not valid UTF-8. Addresses with 256 or leading zeros are invalid.

Input: ({},)

Expected Output: []

Explanation: An empty filesystem contains no files, so no IP addresses are found.

Input: ({'a.txt': 'Not valid: 999.1.1.1 1.2.3 1.2.3.4.5 01.2.3.4', 'sub': {'b.bin': b'\xff'}},)

Expected Output: []

Explanation: 999.1.1.1 is out of range, 1.2.3 is incomplete, 1.2.3.4.5 is part of a longer dot-separated numeric sequence, 01.2.3.4 has a leading zero, and b.bin is skipped due to UTF-8 decode failure.

Input: ({'x.txt': 'Valid: 1.2.3.4, 255.255.255.255, 0.10.20.30. Invalid: 00.0.0.0, 1.02.3.4', 'nested': {'y.dat': b'Also 1.2.3.4 and 0.0.0.0'}},)

Expected Output: ['0.0.0.0', '0.10.20.30', '1.2.3.4', '255.255.255.255']

Explanation: The valid unique addresses are collected across both files. Entries with leading zeros like 00.0.0.0 and 1.02.3.4 are rejected.

Hints

  1. Use a stack or recursion to traverse the nested directory structure so you visit every subdirectory exactly once.
  2. A regex can help you find dotted-number candidates, but you still need to validate each octet and reject leading zeros like `01`.
Last updated: May 7, 2026

Loading coding console...

PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Implement Top-p (Nucleus) Sampling in NumPy - Amazon (medium)
  • Implement Multi-Head Attention from Scratch in NumPy - Amazon (medium)
  • Detect and Break a Cycle in a Singly Linked List - Amazon (medium)
  • Caesar Cipher with Translation-Table Optimization - Amazon (medium)
  • Minimum Drone Delivery Time on a Ring of Hubs - Amazon (medium)