You are given an append-only list of violation events as tuples (id: string, policy: string, date: ISO-8601 string). Build an in-memory "Violation Log Analyzer" that supports:
-
Given an id, return all policies that this id violated.
-
Given a policy, return all ids that violated this policy.
-
Given a date, return all ids that violated any policy on that exact date.
Start with a straightforward approach, then:
-
Choose data structures and analyze the time/space of building the index and answering each query.
-
Optimize using appropriate inverted indexes (e.g., id→policies, policy→ids, date→ids). Show core APIs (build(log), query_by_id(id), query_by_policy(policy), query_by_date(date)) and pseudocode.
-
If there will be many repeated queries, propose caching or precomputation strategies (e.g., memoization of frequent queries, materialized sets), and discuss invalidation when new events arrive.
-
If the event list is sorted by date, explain how to use binary search to locate all events for a target date efficiently and return the corresponding ids; provide the algorithm and complexity.
-
Discuss trade-offs between query latency and memory, and how your design scales with number of events, distinct ids, and distinct policies.