PracHub
QuestionsCoachesLearningGuidesInterview Prep

Quick Overview

This question evaluates the ability to design efficient algorithms and data-processing strategies for identifying shared object identifiers across large log files, testing skills in streaming processing, set membership handling, and external-memory algorithm reasoning.

  • medium
  • Google
  • Coding & Algorithms
  • Software Engineer

Find shared objects across two log files

Company: Google

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: medium

Interview Round: Technical Screen

You are given two large log files representing activity on two different days. Each line of each log has three fields: - `timestamp`: a time value (you may treat it as an opaque string or integer) - `obj_id`: identifier of an object (string or integer) - `client_id`: identifier of a client (string or integer) The logs are not necessarily sorted by any field. Define an object as **interesting** if: 1. Its `obj_id` appears at least once in day 1's log **and** at least once in day 2's log; and 2. Across all of its appearances in both days combined, it is associated with at least **two distinct** `client_id` values. ### Tasks 1. Design and implement a function that, given file handles (or streaming iterators) for the two log files, returns the set of all such interesting `obj_id`s. 2. For your initial, straightforward in-memory solution (you may assume both logs fit in memory), state the time and space complexity in terms of the total number of log records. 3. Follow-up (memory constrained): Now assume the logs are very large and **cannot** both fit into memory at once. - You may still sequentially scan each file and you may perform external sorting. - Describe how you would modify your approach to work under this memory constraint. - Analyze the time and space complexity of this memory-efficient solution.

Quick Answer: This question evaluates the ability to design efficient algorithms and data-processing strategies for identifying shared object identifiers across large log files, testing skills in streaming processing, set membership handling, and external-memory algorithm reasoning.

Return sorted obj_ids that appear in both day logs and have at least two distinct client_ids across both days combined.

Constraints

  • Rows are [timestamp, obj_id, client_id] or dicts with those fields

Examples

Input: ([['t1', 'o1', 'c1'], ['t2', 'o2', 'c1']], [['t3', 'o1', 'c2'], ['t4', 'o2', 'c1']])

Expected Output: ['o1']

Input: ([], [['t', 'o', 'c']])

Expected Output: []

Input: ([['t', 'o', 'c']], [['t', 'o', 'c']])

Expected Output: []

Hints

  1. Track which days each object appears in and its set of clients.
Last updated: Jun 27, 2026

Loading coding console...

PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • AI Coding Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Busiest Rental Car - Google (easy)
  • Find Common Free Time Slots Across Calendars - Google (easy)
  • Deterministic Task Execution Order - Google (easy)
  • Count Clusters of 2D Points Within a Radius - Google (medium)
  • Infection Spread on a Grid (Cellular Automaton) - Google (hard)