PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep

Quick Overview

This question evaluates a candidate's ability to implement data validation, deduplication, stable filtering, time-range and allow-list checks, and to reason about time and space complexity for both batch and streaming scenarios.

  • easy
  • Atlassian
  • Coding & Algorithms
  • Software Engineer

Filter Invalid Data Events

Company: Atlassian

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: easy

Interview Round: Technical Screen

You are given a list of data events. Each event contains: - `event_id`: string - `creator_id`: string - `event_type`: string - `event_time`: Unix timestamp - `payload`: arbitrary object Implement a function that returns only the valid events while preserving the original order. An event is considered invalid and must be filtered out if any of the following is true: 1. `event_id` is missing or empty. 2. `creator_id` is missing or empty. 3. `event_time` is missing or malformed. 4. `event_time` is outside a provided inclusive range `[start_time, end_time]`. 5. `event_type` is not in a provided allow-list. 6. The same `event_id` has already appeared earlier; keep only the first valid occurrence. Return the filtered list of events. Follow-up: What are the time and space complexities, and how would your approach change if the events arrived continuously as a stream instead of as a batch?

Quick Answer: This question evaluates a candidate's ability to implement data validation, deduplication, stable filtering, time-range and allow-list checks, and to reason about time and space complexity for both batch and streaming scenarios.

Part 1: Filter Invalid Data Events from a Batch

You are given a batch of data events and must return only the valid ones while preserving their original order. Each event is a dictionary that may contain the keys 'event_id', 'creator_id', 'event_type', 'event_time', and 'payload'. An event is invalid if: (1) 'event_id' is missing or empty, (2) 'creator_id' is missing or empty, (3) 'event_time' is missing or malformed, (4) 'event_time' is outside the inclusive range [start_time, end_time], (5) 'event_type' is not in the allow-list, or (6) the same 'event_id' has already been accepted earlier in the batch. Keep only the first valid occurrence of each event_id. If an earlier event with the same event_id is invalid, it does not block a later valid one.

Constraints

  • 0 <= len(events) <= 200000
  • 0 <= len(allowed_event_types) <= 10000
  • start_time <= end_time
  • An event_time is considered valid only if it is an integer Unix timestamp; booleans should be treated as malformed
  • Missing keys count as missing fields

Examples

Input: ([{'event_id': 'e1', 'creator_id': 'u1', 'event_type': 'click', 'event_time': 10, 'payload': {'x': 1}}, {'event_id': '', 'creator_id': 'u2', 'event_type': 'click', 'event_time': 11, 'payload': None}, {'event_id': 'e2', 'creator_id': 'u3', 'event_type': 'purchase', 'event_time': 15, 'payload': {}}, {'event_id': 'e1', 'creator_id': 'u4', 'event_type': 'click', 'event_time': 12, 'payload': 'dup'}, {'event_id': 'e3', 'creator_id': 'u5', 'event_type': 'view', 'event_time': 21, 'payload': 0}, {'event_id': 'e4', 'creator_id': 'u6', 'event_type': 'view', 'event_time': 20, 'payload': []}], ['click', 'view'], 10, 20)

Expected Output: [{'event_id': 'e1', 'creator_id': 'u1', 'event_type': 'click', 'event_time': 10, 'payload': {'x': 1}}, {'event_id': 'e4', 'creator_id': 'u6', 'event_type': 'view', 'event_time': 20, 'payload': []}]

Explanation: The empty event_id, disallowed event_type, duplicate valid event_id, and out-of-range timestamp are filtered out.

Input: ([{'event_id': 'z1', 'creator_id': 'u1', 'event_type': 'click', 'event_time': '100', 'payload': None}, {'event_id': 'z1', 'creator_id': 'u1', 'event_type': 'click', 'event_time': 100, 'payload': {'ok': True}}], ['click'], 0, 200)

Expected Output: [{'event_id': 'z1', 'creator_id': 'u1', 'event_type': 'click', 'event_time': 100, 'payload': {'ok': True}}]

Explanation: The first event is malformed because event_time is a string, so it does not block the later valid event with the same event_id.

Input: ([], ['click', 'view'], 0, 10)

Expected Output: []

Explanation: Edge case: empty input should return an empty list.

Input: ([{'event_id': 'a', 'creator_id': 'u', 'event_type': 'view', 'event_time': 5, 'payload': None}, {'event_id': 'b', 'creator_id': 'u', 'event_type': 'view', 'event_time': True, 'payload': None}, {'event_id': 'c', 'creator_id': None, 'event_type': 'view', 'event_time': 10, 'payload': None}, {'event_id': 'd', 'creator_id': 'u', 'event_type': 'view', 'event_time': 15, 'payload': None}], ['view'], 5, 15)

Expected Output: [{'event_id': 'a', 'creator_id': 'u', 'event_type': 'view', 'event_time': 5, 'payload': None}, {'event_id': 'd', 'creator_id': 'u', 'event_type': 'view', 'event_time': 15, 'payload': None}]

Explanation: Boundary timestamps 5 and 15 are valid. True is treated as malformed, and creator_id=None is invalid.

Hints

  1. Use a hash set for the allowed event types so membership checks are fast.
  2. Only add an event_id to your seen set after the event passes every other validation rule.

Part 2: Streaming Event Acceptance Decisions

In a streaming system, events arrive one at a time and you must decide immediately whether to accept or reject each event. Simulate that online behavior. Given a list of events in arrival order, return a list of booleans where the i-th value is True if the i-th event should be accepted when it arrives, and False otherwise. Use the same rules as the batch version: reject events with missing or empty 'event_id' or 'creator_id', missing or malformed 'event_time', out-of-range timestamps, disallowed 'event_type', or duplicate 'event_id' values that were already accepted earlier. A previous invalid event with the same event_id does not make later valid events duplicates. This models the state you would keep in a real stream processor.

Constraints

  • 0 <= len(events) <= 200000
  • 0 <= len(allowed_event_types) <= 10000
  • start_time <= end_time
  • An event_time is considered valid only if it is an integer Unix timestamp; booleans should be treated as malformed
  • For exact duplicate detection in an unbounded real stream, the set of accepted event_ids can grow without bound

Examples

Input: ([{'event_id': 'a', 'creator_id': 'u1', 'event_type': 'click', 'event_time': 5, 'payload': None}, {'event_id': 'b', 'creator_id': '', 'event_type': 'click', 'event_time': 6, 'payload': None}, {'event_id': 'a', 'creator_id': 'u2', 'event_type': 'click', 'event_time': 7, 'payload': None}, {'event_id': 'c', 'creator_id': 'u3', 'event_type': 'view', 'event_time': 11, 'payload': None}, {'event_id': 'd', 'creator_id': 'u4', 'event_type': 'view', 'event_time': 10, 'payload': None}], ['click', 'view'], 5, 10)

Expected Output: [True, False, False, False, True]

Explanation: The first and last events are accepted. The others fail due to empty creator_id, duplicate accepted event_id, and out-of-range timestamp.

Input: ([{'event_id': 'z1', 'creator_id': 'u1', 'event_type': 'click', 'event_time': None, 'payload': None}, {'event_id': 'z1', 'creator_id': 'u1', 'event_type': 'click', 'event_time': 100, 'payload': {'ok': True}}], ['click'], 0, 200)

Expected Output: [False, True]

Explanation: An invalid event does not reserve its event_id, so the later valid event is accepted.

Input: ([], ['view'], 0, 10)

Expected Output: []

Explanation: Edge case: no arrivals means no decisions.

Input: ([{'event_id': 'm1', 'creator_id': 'u', 'event_type': 'view', 'event_time': 1, 'payload': None}, {'event_id': 'm2', 'creator_id': 'u', 'event_type': 'view', 'event_time': True, 'payload': None}, {'event_id': 'm3', 'creator_id': None, 'event_type': 'view', 'event_time': 2, 'payload': None}, {'event_id': 'm4', 'creator_id': 'u', 'event_type': 'view', 'event_time': 3, 'payload': None}], ['view'], 1, 3)

Expected Output: [True, False, False, True]

Explanation: Boundary timestamps are accepted, but True is malformed as a timestamp and creator_id=None is invalid.

Hints

  1. Think about what state must survive between arrivals. You do not need to store all past events.
  2. A hash set of accepted event_ids is enough for exact duplicate checks, but only add an ID after accepting the event.
Last updated: Apr 26, 2026

Loading coding console...

PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Find a secret word using match feedback - Atlassian (hard)
  • Compute a moving average on a stream - Atlassian (hard)
  • Implement sequential and parallel URL requests - Atlassian (medium)
  • Implement sliding-window rate limiter function - Atlassian (medium)
  • Merge intervals and design rating APIs - Atlassian (medium)