This question evaluates the ability to design efficient algorithms and data-processing strategies for identifying shared object identifiers across large log files, testing skills in streaming processing, set membership handling, and external-memory algorithm reasoning.
You are given two large log files representing activity on two different days. Each line of each log has three fields:
timestamp
: a time value (you may treat it as an opaque string or integer)
obj_id
: identifier of an object (string or integer)
client_id
: identifier of a client (string or integer)
The logs are not necessarily sorted by any field.
Define an object as interesting if:
obj_id
appears at least once in day 1's log
and
at least once in day 2's log; and
client_id
values.
obj_id
s.