How to handle huge inputs?
Company: Google
Role: Software Engineer
Category: Software Engineering Fundamentals
Difficulty: medium
Interview Round: Onsite
In a coding interview, after solving an in-memory algorithmic problem, the interviewer asks: "What would you do if the input were extremely large?"
Give a structured answer that explains how to reason from constraints to design choices.
### Constraints & Assumptions
- The original problem could be common, such as interval scheduling, grouping, counting, or sorting.
- The follow-up is general rather than tied to one exact algorithm.
- Discuss exact versus approximate answers, memory limits, streaming, chunking, external storage, and distributed processing.
- Do not jump straight to "use distributed systems" without explaining why.
### Clarifying Questions to Ask
- Does the full input fit in memory?
- Is the input static, streamed, or continuously updated?
- Do we need an exact answer or is approximation acceptable?
- Do we need global ordering, random access, or only a summary?
- Are we constrained by memory, disk I/O, network, latency, or throughput?
- Is this single-machine or multi-machine?
### What a Strong Answer Covers
- In-memory space optimization when data still fits.
- Streaming algorithms when only compact state is needed.
- Chunk-and-merge techniques when the problem decomposes.
- External-memory sorting or disk-backed indexes when global coordination is required.
- Approximation techniques when exact answers are too expensive and allowed.
- Distributed partitioning only when scale exceeds one machine.
- Tradeoffs among correctness, memory, I/O, latency, and complexity.
### Follow-up Questions
- How would your answer change for a problem requiring sorting?
- What if there is severe key skew?
- What approximate data structures might help?
- How would you test correctness for chunked processing?
Quick Answer: Answer the coding interview follow-up "what if the input is huge?" Covers memory checks, streaming, chunking, external sort, approximate structures, distributed processing, and engineering tradeoffs.