Optimize file reads for password validation
Company: Instacart
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: Medium
Interview Round: HR Screen
Quick Answer: This question evaluates proficiency in on-disk file I/O, streaming algorithms, offline indexing for sublinear lookups, and system-level performance considerations for validating credentials in large files.
Constraints
- Each non-empty record splits into exactly three comma-separated fields: user_id, salt, password_hash.
- password_hash is a hexadecimal SHA-256 digest of (salt + password); comparison is case-insensitive.
- Blank lines and malformed records (not exactly three fields) are skipped.
- The file is unsorted; user_ids are not guaranteed unique-by-position but the first matching record decides the result.
- Only constant extra memory beyond the current line may be used; scanning stops at the first user_id match.
Examples
Input: (['u1,s1,ebd45ded486507e4a30383e322e28c6cfc1be36610b1be98d98f50ce6ae2f823', 'u2,saltA,69a10e54ed3b69ba08830135dca8bb7a24e5be0d372da029534be41e069ab945'], 'u1', 'hunter2')
Expected Output: True
Explanation: SHA-256('s1'+'hunter2') equals the stored hash for u1, so validation succeeds; the scan stops at the first record without reading u2.
Input: (['u1,s1,ebd45ded486507e4a30383e322e28c6cfc1be36610b1be98d98f50ce6ae2f823', 'u2,saltA,69a10e54ed3b69ba08830135dca8bb7a24e5be0d372da029534be41e069ab945'], 'u1', 'wrongpw')
Expected Output: False
Explanation: u1 exists but SHA-256('s1'+'wrongpw') does not match the stored hash, so validation fails.
Input: (['u2,saltA,69a10e54ed3b69ba08830135dca8bb7a24e5be0d372da029534be41e069ab945'], 'u2', 'pw')
Expected Output: True
Explanation: Single record: SHA-256('saltA'+'pw') matches u2's stored hash.
Input: (['u1,s1,ebd45ded486507e4a30383e322e28c6cfc1be36610b1be98d98f50ce6ae2f823'], 'ghost', 'whatever')
Expected Output: False
Explanation: The user 'ghost' never appears in any record, so the function returns False after scanning.
Input: ([], 'u1', 'pw')
Expected Output: False
Explanation: Empty file: no records to scan, so validation returns False.
Input: (['', ' ', 'u3,xyz,92e335e07991a93e6e69e99448921fe70160a03c94f1d0ea43e7a01298390d87'], 'u3', 'correct horse')
Expected Output: True
Explanation: Blank and whitespace-only lines are skipped; the real record for u3 validates with SHA-256('xyz'+'correct horse').
Input: (['badrecord_no_commas', 'u3,xyz,92e335e07991a93e6e69e99448921fe70160a03c94f1d0ea43e7a01298390d87'], 'u3', 'correct horse')
Expected Output: True
Explanation: A malformed line without commas is skipped defensively; the next well-formed record for u3 validates correctly.
Input: (['q,q,2D0010E5DB839F71744C2F2EC9F76922EC7ACCD74CEB7669CB14546F95777BD1'], 'q', 'abc')
Expected Output: True
Explanation: Stored hash is uppercase hex; comparison is case-insensitive, so SHA-256('q'+'abc') still matches.
Hints
- Iterate the records one at a time instead of building a dict of the whole file — that keeps memory constant regardless of file size.
- Split each line on commas and compare the first field to the target user_id; only hash when the user matches so you do no wasted work.
- Recompute the hash as SHA-256(salt + password) and compare hex digests case-insensitively; return immediately on the first matching user_id so you stop reading early.
- For part (b), a one-time scan can build a user_id -> byte-offset index so later runs seek directly to the record (sublinear lookups), trading preprocessing time and index storage for faster repeated queries.