PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep

Quick Overview

This question evaluates proficiency in data structures, in-memory indexing and query design, algorithmic time/space complexity analysis, and reasoning about state management for dynamic event streams.

  • Medium
  • Pinterest
  • Coding & Algorithms
  • Software Engineer

Design a violation log analyzer

Company: Pinterest

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: Medium

Interview Round: Technical Screen

You are given an append-only list of violation events as tuples (id: string, policy: string, date: ISO-8601 string). Build an in-memory "Violation Log Analyzer" that supports: 1) Given an id, return all policies that this id violated. 2) Given a policy, return all ids that violated this policy. 3) Given a date, return all ids that violated any policy on that exact date. Start with a straightforward approach, then: - Choose data structures and analyze the time/space of building the index and answering each query. - Optimize using appropriate inverted indexes (e.g., id→policies, policy→ids, date→ids). Show core APIs (build(log), query_by_id(id), query_by_policy(policy), query_by_date(date)) and pseudocode. - If there will be many repeated queries, propose caching or precomputation strategies (e.g., memoization of frequent queries, materialized sets), and discuss invalidation when new events arrive. - If the event list is sorted by date, explain how to use binary search to locate all events for a target date efficiently and return the corresponding ids; provide the algorithm and complexity. - Discuss trade-offs between query latency and memory, and how your design scales with number of events, distinct ids, and distinct policies.

Quick Answer: This question evaluates proficiency in data structures, in-memory indexing and query design, algorithmic time/space complexity analysis, and reasoning about state management for dynamic event streams.

Part 1: Straightforward Violation Log Analyzer

Implement a simple in-memory violation log analyzer without building indexes. For each query, scan the entire append-only event list and collect the matching unique values. Each event is represented as [id, policy, date]. Supported query types are ['id', value], ['policy', value], and ['date', value]. Return every query result as a lexicographically sorted list of unique strings.

Constraints

  • 0 <= len(events) <= 2000
  • 0 <= len(queries) <= 2000
  • Each event has exactly three strings: [id, policy, date].
  • Dates use a consistent ISO-8601 date format such as YYYY-MM-DD.
  • Duplicate events may appear, but output values must be unique.

Examples

Input: ([['alice', 'spam', '2024-01-01'], ['bob', 'fraud', '2024-01-01'], ['alice', 'fraud', '2024-01-02'], ['alice', 'spam', '2024-01-01']], [['id', 'alice'], ['policy', 'fraud'], ['date', '2024-01-01'], ['id', 'carol']])

Expected Output: [['fraud', 'spam'], ['alice', 'bob'], ['alice', 'bob'], []]

Explanation: alice violated fraud and spam; fraud was violated by alice and bob; two ids appear on 2024-01-01; carol has no events.

Input: ([], [['id', 'x'], ['policy', 'P1'], ['date', '2024-01-01']])

Expected Output: [[], [], []]

Explanation: The empty log produces empty results for every query type.

Input: ([['u1', 'P1', '2025-12-31']], [['id', 'u1'], ['policy', 'P1'], ['date', '2025-01-01']])

Expected Output: [['P1'], ['u1'], []]

Explanation: The single event matches the id and policy queries but not the different date.

Input: ([['u3', 'B', '2024-02-02'], ['u1', 'C', '2024-02-02'], ['u2', 'A', '2024-02-03'], ['u3', 'A', '2024-02-02']], [['date', '2024-02-02'], ['id', 'u3'], ['policy', 'A']])

Expected Output: [['u1', 'u3'], ['A', 'B'], ['u2', 'u3']]

Explanation: The analyzer scans all events for each query and returns unique sorted values.

Solution

def solution(events, queries):
    results = []

    for query in queries:
        kind, value = query
        found = set()

        if kind == 'id':
            for eid, policy, date in events:
                if eid == value:
                    found.add(policy)
        elif kind == 'policy':
            for eid, policy, date in events:
                if policy == value:
                    found.add(eid)
        elif kind == 'date':
            for eid, policy, date in events:
                if date == value:
                    found.add(eid)
        else:
            raise ValueError('unknown query type')

        results.append(sorted(found))

    return results

Time complexity: Let N be the number of events, Q be the number of queries, and r_i be the number of distinct values returned by query i. Each query scans all events, so the total time is O(Q * N + sum(r_i log r_i)) because each result is sorted.. Space complexity: O(max r_i) auxiliary space for the set used by one query, not counting the returned output. Including output, space is O(sum r_i)..

Hints

  1. For a direct solution, you do not need extra data structures beyond a set for the current query.
  2. Think about which field to compare for each query type, and which field should be added to the answer.

Part 2: Violation Log Analyzer with Inverted Indexes

Implement an optimized static violation log analyzer. Before answering queries, build three inverted indexes: id to policies, policy to ids, and date to ids. Then answer each query using the appropriate index. Each event is represented as [id, policy, date]. Return every query result as a lexicographically sorted list of unique strings.

Constraints

  • 0 <= len(events) <= 200000
  • 0 <= len(queries) <= 200000
  • Each event has exactly three strings: [id, policy, date].
  • Dates use a consistent ISO-8601 date format such as YYYY-MM-DD.
  • The total size of all returned lists is bounded by available memory.

Examples

Input: ([['u1', 'A', '2024-05-01'], ['u2', 'A', '2024-05-01'], ['u1', 'B', '2024-05-02'], ['u3', 'A', '2024-05-03']], [['policy', 'A'], ['id', 'u1'], ['date', '2024-05-01'], ['policy', 'Z']])

Expected Output: [['u1', 'u2', 'u3'], ['A', 'B'], ['u1', 'u2'], []]

Explanation: The policy, id, and date indexes answer the queries directly. Missing policy Z returns an empty list.

Input: ([], [['id', 'missing'], ['policy', 'missing'], ['date', '2024-01-01']])

Expected Output: [[], [], []]

Explanation: All indexes are empty.

Input: ([['u1', 'P', '2024-01-01'], ['u1', 'P', '2024-01-01'], ['u1', 'Q', '2024-01-01']], [['id', 'u1'], ['policy', 'P'], ['date', '2024-01-01']])

Expected Output: [['P', 'Q'], ['u1'], ['u1']]

Explanation: Duplicate events do not create duplicate output values because index values are sets.

Input: ([['solo', 'Only', '2024-12-31']], [['policy', 'Only'], ['date', '2024-12-30'], ['id', 'solo']])

Expected Output: [['solo'], [], ['Only']]

Explanation: A single event is indexed in all three inverted indexes.

Solution

def solution(events, queries):
    id_to_policies = {}
    policy_to_ids = {}
    date_to_ids = {}

    for event in events:
        eid, policy, date = event

        if eid not in id_to_policies:
            id_to_policies[eid] = set()
        id_to_policies[eid].add(policy)

        if policy not in policy_to_ids:
            policy_to_ids[policy] = set()
        policy_to_ids[policy].add(eid)

        if date not in date_to_ids:
            date_to_ids[date] = set()
        date_to_ids[date].add(eid)

    results = []
    for query in queries:
        kind, value = query
        if kind == 'id':
            results.append(sorted(id_to_policies.get(value, set())))
        elif kind == 'policy':
            results.append(sorted(policy_to_ids.get(value, set())))
        elif kind == 'date':
            results.append(sorted(date_to_ids.get(value, set())))
        else:
            raise ValueError('unknown query type')

    return results

Time complexity: Let N be the number of events and r_i be the number of distinct values returned by query i. Building the indexes takes O(N) average time. Answering all queries takes O(Q + sum(r_i log r_i)) because each lookup is O(1) average and each returned set is sorted.. Space complexity: O(N) for the three inverted indexes in the worst case, plus O(sum r_i) for the returned output..

Hints

  1. Build dictionaries whose values are sets so duplicates are removed automatically.
  2. After the index is built, each query should avoid scanning the entire event list.

Part 3: Cached Violation Log Analyzer with Append Invalidation

Implement a violation log analyzer that supports repeated queries and new appended events. Build materialized indexes for id, policy, and date. Additionally, cache sorted query results so repeated identical queries can be returned quickly. When a new event is appended, update the indexes and invalidate only cached answers affected by that event.

Constraints

  • 0 <= len(events) <= 200000
  • 0 <= len(operations) <= 200000
  • Each initial event and append operation contains exactly one id, one policy, and one date.
  • Dates use a consistent ISO-8601 date format such as YYYY-MM-DD.
  • Duplicate events may be appended; outputs must still contain unique values.

Examples

Input: ([['u1', 'P1', '2024-01-01'], ['u2', 'P1', '2024-01-01']], [['query_id', 'u1'], ['query_id', 'u1'], ['append', 'u1', 'P2', '2024-01-02'], ['query_id', 'u1'], ['query_policy', 'P2'], ['query_date', '2024-01-02']])

Expected Output: [['P1'], ['P1'], ['P1', 'P2'], ['u1'], ['u1']]

Explanation: The repeated first query can be cached. After appending a new event for u1, the cached query_id result for u1 must be invalidated.

Input: ([], [['query_id', 'x'], ['append', 'x', 'A', '2025-01-01'], ['query_id', 'x'], ['query_policy', 'A'], ['query_date', '2025-01-01']])

Expected Output: [[], ['A'], ['x'], ['x']]

Explanation: A missing id initially returns empty. After append, the affected id, policy, and date queries return the new data.

Input: ([['a', 'P', '2024-01-01']], [['query_policy', 'P'], ['query_date', '2024-01-01'], ['append', 'b', 'P', '2024-01-02'], ['query_policy', 'P'], ['query_date', '2024-01-01'], ['query_date', '2024-01-02']])

Expected Output: [['a'], ['a'], ['a', 'b'], ['a'], ['b']]

Explanation: Appending b to policy P invalidates policy P and date 2024-01-02, but not the cached result for date 2024-01-01.

Input: ([], [['append', 'x', 'P', '2024-03-03'], ['query_id', 'x'], ['append', 'x', 'P', '2024-03-03'], ['query_id', 'x'], ['query_policy', 'P'], ['query_date', '2024-03-03']])

Expected Output: [['P'], ['P'], ['x'], ['x']]

Explanation: Appending a duplicate event does not duplicate values in the output.

Solution

def solution(events, operations):
    id_to_policies = {}
    policy_to_ids = {}
    date_to_ids = {}

    def add_event(eid, policy, date):
        if eid not in id_to_policies:
            id_to_policies[eid] = set()
        id_to_policies[eid].add(policy)

        if policy not in policy_to_ids:
            policy_to_ids[policy] = set()
        policy_to_ids[policy].add(eid)

        if date not in date_to_ids:
            date_to_ids[date] = set()
        date_to_ids[date].add(eid)

    for event in events:
        add_event(event[0], event[1], event[2])

    id_cache = {}
    policy_cache = {}
    date_cache = {}
    results = []

    for op in operations:
        kind = op[0]

        if kind == 'append':
            eid, policy, date = op[1], op[2], op[3]
            add_event(eid, policy, date)

            id_cache.pop(eid, None)
            policy_cache.pop(policy, None)
            date_cache.pop(date, None)

        elif kind == 'query_id':
            eid = op[1]
            if eid not in id_cache:
                id_cache[eid] = sorted(id_to_policies.get(eid, set()))
            results.append(list(id_cache[eid]))

        elif kind == 'query_policy':
            policy = op[1]
            if policy not in policy_cache:
                policy_cache[policy] = sorted(policy_to_ids.get(policy, set()))
            results.append(list(policy_cache[policy]))

        elif kind == 'query_date':
            date = op[1]
            if date not in date_cache:
                date_cache[date] = sorted(date_to_ids.get(date, set()))
            results.append(list(date_cache[date]))

        else:
            raise ValueError('unknown operation type')

    return results

Time complexity: Let N be the initial number of events. Building indexes takes O(N). Each append is O(1) average time for index updates and O(1) average time for cache invalidation. A cache-miss query returning r values costs O(r log r) to sort. A cache-hit query costs O(r) here because the cached list is copied into the output.. Space complexity: O(N + A) for the materialized indexes after A appended events, plus O(C) for cached sorted answers, where C is the total size of cached result lists. Returned output also uses O(total returned values)..

Hints

  1. Use sets in the indexes as the source of truth, and cache only the sorted list representation.
  2. An appended event with id x, policy p, and date d can only change cached answers for query_id x, query_policy p, and query_date d.

Part 4: Binary Search Date Lookup in a Sorted Violation Log

You are given a violation event list sorted in nondecreasing order by date. Implement efficient exact-date lookup using binary search. For each target date, find the contiguous range of events with that date and return the unique ids that violated any policy on that date.

Constraints

  • 0 <= len(events) <= 200000
  • 0 <= len(dates) <= 200000
  • events is sorted in nondecreasing order by the date field.
  • Dates use a consistent ISO-8601 date format such as YYYY-MM-DD, so lexicographic comparison matches chronological order.
  • Duplicate ids may appear multiple times on the same date, but each output list must contain unique ids.

Examples

Input: ([['u1', 'P1', '2024-01-01'], ['u2', 'P2', '2024-01-01'], ['u1', 'P3', '2024-01-02'], ['u3', 'P1', '2024-01-04']], ['2024-01-01', '2024-01-03', '2024-01-04'])

Expected Output: [['u1', 'u2'], [], ['u3']]

Explanation: Binary search locates the date ranges for the first and last queries. 2024-01-03 is absent.

Input: ([], ['2024-01-01'])

Expected Output: [[]]

Explanation: An empty sorted log has no ids for any date.

Input: ([['a', 'X', '2024-12-31']], ['2024-12-30', '2024-12-31'])

Expected Output: [[], ['a']]

Explanation: The target before the only event is absent; the exact date matches.

Input: ([['b', 'P', '2024-01-01'], ['a', 'Q', '2024-01-02'], ['a', 'R', '2024-01-02'], ['c', 'Q', '2024-01-02'], ['d', 'P', '2024-01-05']], ['2024-01-02', '2024-01-05'])

Expected Output: [['a', 'c'], ['d']]

Explanation: Multiple events for id a on 2024-01-02 appear only once in the result.

Input: ([['a', 'P', '2024-02-10'], ['b', 'Q', '2024-02-11']], ['2024-01-01', '2024-12-31'])

Expected Output: [[], []]

Explanation: Dates outside the sorted log boundaries return empty lists.

Solution

def solution(events, dates):
    n = len(events)

    def lower_bound(target):
        lo, hi = 0, n
        while lo < hi:
            mid = (lo + hi) // 2
            if events[mid][2] < target:
                lo = mid + 1
            else:
                hi = mid
        return lo

    def upper_bound(target):
        lo, hi = 0, n
        while lo < hi:
            mid = (lo + hi) // 2
            if events[mid][2] <= target:
                lo = mid + 1
            else:
                hi = mid
        return lo

    results = []
    for target in dates:
        left = lower_bound(target)
        right = upper_bound(target)

        ids = set()
        for i in range(left, right):
            ids.add(events[i][0])

        results.append(sorted(ids))

    return results

Time complexity: Let N be the number of events, Q be the number of target dates, k_i be the number of events matching query i, and r_i be the number of distinct ids returned by query i. Each query costs O(log N + k_i + r_i log r_i). Total time is O(Q log N + sum k_i + sum(r_i log r_i)).. Space complexity: O(max r_i) auxiliary space for the set of ids for one date, not counting returned output. Including output, space is O(sum r_i)..

Hints

  1. Use one binary search to find the first event whose date is not less than the target.
  2. Use another binary search to find the first event whose date is greater than the target; the answer lies between those two positions.
Last updated: Jun 23, 2026

Loading coding console...

PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Maximize Boxes Stored Through One Entrance - Pinterest (medium)
  • Solve Multiple Coding Interview Problems - Pinterest (medium)
  • Implement a Sparse Matrix Class - Pinterest (medium)
  • Assign Pins to Shortest Columns - Pinterest (medium)
  • Design Hierarchical Permission Checks - Pinterest (medium)