Design a violation log analyzer
Company: Pinterest
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: Medium
Interview Round: Technical Screen
You are given an append-only list of violation events as tuples (id: string, policy: string, date: ISO-8601 string). Build an in-memory "Violation Log Analyzer" that supports:
1) Given an id, return all policies that this id violated.
2) Given a policy, return all ids that violated this policy.
3) Given a date, return all ids that violated any policy on that exact date.
Start with a straightforward approach, then:
- Choose data structures and analyze the time/space of building the index and answering each query.
- Optimize using appropriate inverted indexes (e.g., id→policies, policy→ids, date→ids). Show core APIs (build(log), query_by_id(id), query_by_policy(policy), query_by_date(date)) and pseudocode.
- If there will be many repeated queries, propose caching or precomputation strategies (e.g., memoization of frequent queries, materialized sets), and discuss invalidation when new events arrive.
- If the event list is sorted by date, explain how to use binary search to locate all events for a target date efficiently and return the corresponding ids; provide the algorithm and complexity.
- Discuss trade-offs between query latency and memory, and how your design scales with number of events, distinct ids, and distinct policies.
Quick Answer: This question evaluates proficiency in data structures, in-memory indexing and query design, algorithmic time/space complexity analysis, and reasoning about state management for dynamic event streams.