Implement crawler, dedup, and persistent LRU
Company: Anthropic
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: Medium
Interview Round: Onsite
##### Question
LeetCode 1236. Web Crawler: Crawl web pages starting from a given URL within the same hostname.
LeetCode 609. Find Duplicate File in System: Identify duplicate files in a filesystem based on content.
LeetCode 146. LRU Cache (extended): Implement an LRU cache decorator that correctly handles variable-length positional and keyword arguments, and add persistence (serialization/deserialization) support.
https://leetcode.com/problems/web-crawler/description/ https://leetcode.com/problems/find-duplicate-file-in-system/description/ https://leetcode.com/problems/lru-cache/description/
Quick Answer: This interview question evaluates algorithm design, data structures, correctness, complexity, edge cases, and implementation details in a realistic interview setting. A strong answer for Implement crawler, dedup, and persistent LRU states assumptions, handles edge cases, explains trade-offs, and shows how to validate the result clearly.
Solution
# Solution Alignment
The prompt asks for an implementation-level answer. The safest way to present it is to define the state, maintain clear invariants, then walk through complexity and tests.
## Problem Restatement
##### Question LeetCode 1236. Web Crawler: Crawl web pages starting from a given URL within the same hostname. LeetCode 609. Find Duplicate File in System: Identify duplicate files in a filesystem based on content. LeetCode 146. LRU Cache (extended): Implement an LRU cache decorator that correctly handles variable-length positional and keyword arguments, and add persistence (serialization/deserialization) support. https://leetcode.com/problems/web-crawler/description/ https://leetcode.com/problems/find-duplicate-file-in-system/description/ https://leetcode.com/problems/lru-cache/description/
## Recommended Approach
Use a hash map from key to doubly linked-list node plus a doubly linked list ordered by recency. `get` moves the node to the front. `put` updates and moves an existing node, or inserts a new node at the front and evicts the tail when capacity is exceeded.
## Correctness
The implementation should maintain an invariant after each loop or operation that directly matches the problem statement. At termination, that invariant implies the returned value has considered every valid candidate exactly once, or has preserved the required data-structure state after every API call.
## Complexity
get and put are O(1) average time. Space is O(capacity).
## Edge Cases and Tests
Capacity 0 or 1, updating an existing key, eviction order after get, repeated puts, and missing-key gets.