Find shared objects across two log files

This is a technical interview question from Google for Software Engineer roles. View the full question and solution on PracHub.

Question

You are given two large log files representing activity on two different days. Each line of each log has three fields:

The logs are not necessarily sorted by any field.

Define an object as interesting if:

Its obj_id appears at least once in day 1's log and at least once in day 2's log; and
Across all of its appearances in both days combined, it is associated with at least two distinct client_id values.

Design and implement a function that, given file handles (or streaming iterators) for the two log files, returns the set of all such interesting obj_id s.
For your initial, straightforward in-memory solution (you may assume both logs fit in memory), state the time and space complexity in terms of the total number of log records.
Follow-up (memory constrained): Now assume the logs are very large and cannot both fit into memory at once.
- You may still sequentially scan each file and you may perform external sorting.
- Describe how you would modify your approach to work under this memory constraint.
- Analyze the time and space complexity of this memory-efficient solution.

Comments (0)