Filter Invalid Data Events
Company: Atlassian
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: easy
Interview Round: Technical Screen
Quick Answer: This question evaluates a candidate's ability to implement data validation, deduplication, stable filtering, time-range and allow-list checks, and to reason about time and space complexity for both batch and streaming scenarios.
Part 1: Filter Invalid Data Events from a Batch
Constraints
- 0 <= len(events) <= 200000
- 0 <= len(allowed_event_types) <= 10000
- start_time <= end_time
- An event_time is considered valid only if it is an integer Unix timestamp; booleans should be treated as malformed
- Missing keys count as missing fields
Examples
Input: ([{'event_id': 'e1', 'creator_id': 'u1', 'event_type': 'click', 'event_time': 10, 'payload': {'x': 1}}, {'event_id': '', 'creator_id': 'u2', 'event_type': 'click', 'event_time': 11, 'payload': None}, {'event_id': 'e2', 'creator_id': 'u3', 'event_type': 'purchase', 'event_time': 15, 'payload': {}}, {'event_id': 'e1', 'creator_id': 'u4', 'event_type': 'click', 'event_time': 12, 'payload': 'dup'}, {'event_id': 'e3', 'creator_id': 'u5', 'event_type': 'view', 'event_time': 21, 'payload': 0}, {'event_id': 'e4', 'creator_id': 'u6', 'event_type': 'view', 'event_time': 20, 'payload': []}], ['click', 'view'], 10, 20)