IPv4 CIDR Range Membership Queries
Company: Databricks
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Onsite
## IPv4 CIDR Range Membership Queries
You are building a lookup component for a network access-control system. The
system is configured with a set of IPv4 **CIDR blocks**, and for any incoming
IPv4 address it must decide which configured block (if any) the address belongs
to.
### Background: CIDR notation
An IPv4 address is written in dotted-decimal form as four octets, e.g.
`192.168.1.10`, where each octet is an integer in `[0, 255]`. It can be viewed
as a 32-bit unsigned integer (`192*2^24 + 168*2^16 + 1*2^8 + 10`).
A CIDR block is written as `address/prefix`, e.g. `192.168.1.0/24`. The integer
`prefix` (in `[0, 32]`) is the number of leading bits that are fixed. The block
covers **every** IPv4 address whose top `prefix` bits match the top `prefix`
bits of the block's address. For example:
- `192.168.1.0/24` covers `192.168.1.0` through `192.168.1.255` (256 addresses).
- `10.0.0.0/8` covers `10.0.0.0` through `10.255.255.255`.
- `0.0.0.0/0` covers every IPv4 address.
- `192.168.1.42/32` covers exactly the single address `192.168.1.42`.
Bits of the block address below the prefix length are ignored for matching
(e.g. `192.168.1.5/24` matches the same addresses as `192.168.1.0/24`).
### Task
Implement a class `CidrMatcher`:
- `CidrMatcher(cidrs)` — initializes the matcher from a list of CIDR strings
(e.g. `["10.0.0.0/8", "192.168.1.0/24", "192.168.0.0/16"]`).
- `match(ip) -> str` — given a dotted-decimal IPv4 address string `ip`, return
the **most specific** matching CIDR block from the configured list. "Most
specific" means the matching block with the **longest prefix length** (the
smallest covered range). If two configured blocks tie on prefix length, an
address can match at most one of them only if they are identical; assume the
input list contains no duplicate blocks, so the longest-prefix match is
unique. Return the matching CIDR string **exactly as it appeared in the input
list**. If no configured block contains `ip`, return the empty string `""`.
This "return the longest-prefix match" rule is exactly how a router chooses a
route when several entries overlap.
### Constraints
- `0 <= len(cidrs) <= 2 * 10^5`.
- Every input CIDR is well-formed: four octets each in `[0, 255]` and a prefix
in `[0, 32]`.
- Each `match` query address is a well-formed dotted-decimal IPv4 string.
- There may be up to `2 * 10^5` `match` queries; build whatever structure you
need at construction time so queries are fast.
### Example
```text
Input:
CidrMatcher(["10.0.0.0/8", "192.168.0.0/16", "192.168.1.0/24"])
match("192.168.1.42") -> "192.168.1.0/24"
match("192.168.2.7") -> "192.168.0.0/16"
match("10.1.2.3") -> "10.0.0.0/8"
match("8.8.8.8") -> ""
Output (return values, in order):
"192.168.1.0/24"
"192.168.0.0/16"
"10.0.0.0/8"
""
```
Explanation: `192.168.1.42` is covered by both `192.168.0.0/16` and
`192.168.1.0/24`; the `/24` is more specific (longer prefix) so it wins.
### Output format
For each `match` query, return the matching CIDR string, or `""` if there is no
match.
Quick Answer: This question evaluates a candidate's ability to design an efficient longest-prefix-match lookup structure for IP address ranges, a core networking and algorithms concept. It tests bit manipulation, trie or interval-based indexing, and reasoning about overlapping ranges under scale constraints. Such questions are common in coding interviews to assess practical application of data structure design beyond basic array or hash-map solutions.
You are building a lookup component for a network access-control system configured with a set of IPv4 **CIDR blocks**. For any incoming IPv4 address it must decide which configured block (if any) the address belongs to.
### Background: CIDR notation
An IPv4 address in dotted-decimal form (e.g. `192.168.1.10`) is four octets, each in `[0, 255]`, and can be viewed as a 32-bit unsigned integer (`192*2^24 + 168*2^16 + 1*2^8 + 10`).
A CIDR block is written as `address/prefix` (e.g. `192.168.1.0/24`). The integer `prefix` in `[0, 32]` is the number of leading bits that are fixed; the block covers every IPv4 address whose top `prefix` bits match the top `prefix` bits of the block's address. For example:
- `192.168.1.0/24` covers `192.168.1.0` through `192.168.1.255` (256 addresses).
- `10.0.0.0/8` covers `10.0.0.0` through `10.255.255.255`.
- `0.0.0.0/0` covers every IPv4 address.
- `192.168.1.42/32` covers exactly the single address `192.168.1.42`.
Bits of the block address below the prefix length are ignored for matching (so `192.168.1.5/24` matches the same addresses as `192.168.1.0/24`).
### Task
Conceptually you implement a class `CidrMatcher(cidrs)` with a `match(ip)` method that returns the **most specific** matching CIDR block — the matching block with the **longest prefix length** (the smallest covered range). Return the matching CIDR string **exactly as it appeared in the input list**, or the empty string `""` if none matches. The input list has no duplicate blocks, so the longest-prefix match is unique.
In this console the two phases are combined into a single function:
`solution(cidrs, queries)` — `cidrs` is the list of configured CIDR strings, `queries` is a list of dotted-decimal IPv4 address strings. Return a list of strings: for each query, its longest-prefix matching CIDR or `""`. Build whatever structure you need up front so each query is fast.
### Example
```text
cidrs = ["10.0.0.0/8", "192.168.0.0/16", "192.168.1.0/24"]
queries = ["192.168.1.42", "192.168.2.7", "10.1.2.3", "8.8.8.8"]
returns ["192.168.1.0/24", "192.168.0.0/16", "10.0.0.0/8", ""]
```
`192.168.1.42` is covered by both `192.168.0.0/16` and `192.168.1.0/24`; the `/24` is more specific (longer prefix) so it wins.
Constraints
- 0 <= len(cidrs) <= 2 * 10^5
- Every input CIDR is well-formed: four octets each in [0, 255] and a prefix in [0, 32]
- Each query address is a well-formed dotted-decimal IPv4 string
- Up to 2 * 10^5 queries; no duplicate CIDR blocks, so the longest-prefix match is unique
Examples
Input: (["10.0.0.0/8", "192.168.0.0/16", "192.168.1.0/24"], ["192.168.1.42", "192.168.2.7", "10.1.2.3", "8.8.8.8"])
Expected Output: ["192.168.1.0/24", "192.168.0.0/16", "10.0.0.0/8", ""]
Explanation: 192.168.1.42 is covered by both the /16 and the /24; the /24 wins as the longer prefix. 192.168.2.7 falls outside the /24 but inside the /16. 10.1.2.3 only matches the /8. 8.8.8.8 matches nothing, so "".
Input: (["0.0.0.0/0", "192.168.1.42/32"], ["192.168.1.42", "1.2.3.4"])
Expected Output: ["192.168.1.42/32", "0.0.0.0/0"]
Explanation: The /0 default route covers every address, but 192.168.1.42 also matches the /32 host route, which is more specific and wins. 1.2.3.4 only matches the /0 catch-all.
Input: ([], ["1.2.3.4"])
Expected Output: [""]
Explanation: Empty configuration: no block can match, so every query returns "".
Input: (["10.0.0.0/8"], ["10.255.255.255", "11.0.0.0", "9.255.255.255"])
Expected Output: ["10.0.0.0/8", "", ""]
Explanation: 10.0.0.0/8 spans 10.0.0.0 through 10.255.255.255 inclusive, so the top boundary 10.255.255.255 matches, while the addresses just above (11.0.0.0) and just below (9.255.255.255) do not.
Input: (["192.168.1.5/24"], ["192.168.1.200"])
Expected Output: ["192.168.1.5/24"]
Explanation: Host bits below the /24 prefix are ignored, so 192.168.1.5/24 behaves like 192.168.1.0/24 and covers 192.168.1.200. The returned string is the block exactly as configured.
Hints
- Convert each IPv4 address to a 32-bit integer: (a<<24)|(b<<16)|(c<<8)|d. Comparisons and masking are then plain integer bit operations.
- For prefix p, the netmask is the top p bits set: ((1<<p)-1)<<(32-p), with p=0 giving mask 0 (matches everything). A block's canonical network is address & mask, which discards host bits so 192.168.1.5/24 and 192.168.1.0/24 collapse to the same key.
- Store a dict from (network, prefix) to the original CIDR string. For a query integer v, try prefixes from 32 down to 0 and look up (v & mask(p), p); the first hit is the longest-prefix (most specific) match. Return "" if none of the 33 probes hit.