How do I practice coding and algorithm questions?

Use PracHub's coding console to write, test, and debug your solutions in Python or JavaScript. View hints, test against sample inputs, and compare with official solutions.

What difficulty level is this coding question?

This is a Medium difficulty Coding & Algorithms question, commonly asked during HR Screen rounds at Instacart.

What role is this question designed for?

This question is commonly asked for Software Engineer candidates at Instacart during technical interviews.

Optimize file reads for password validation

Company: Instacart

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: Medium

Interview Round: HR Screen

Given a multi-gigabyte credentials file on disk with one record per line in the format user_id,salt,password_hash (hex), implement validate(user_id, password) -> bool. The file is unsorted and cannot be fully loaded into memory. Optimize space by only reading the content needed. Propose and implement: (a) a streaming approach that scans lines with constant memory and stops early when the user is found; (b) an optional offline indexing strategy (e.g., build a user_id -> byte offset index) to enable sublinear lookups on subsequent runs. Discuss trade-offs, file encoding/error handling, memory-mapped I/O vs buffered reads, and time/space complexity.

Quick Answer: This question evaluates proficiency in on-disk file I/O, streaming algorithms, offline indexing for sublinear lookups, and system-level performance considerations for validating credentials in large files.

You are given a large credentials file as a list of lines (simulating a streamed read of a multi-gigabyte file that cannot be loaded into memory). Each non-empty line is a record in the format `user_id,salt,password_hash`, where `password_hash` is the lowercase hexadecimal SHA-256 digest of the string `salt + password`. Implement `validate(lines, user_id, password) -> bool` that scans the records and returns `True` if and only if a record exists for `user_id` and the supplied `password`, when hashed with that record's `salt`, equals the stored `password_hash`. Requirements: - Process records one at a time and use only constant extra memory (do not materialize the whole file or build large structures). - Stop scanning as soon as the target `user_id` is found (early exit) — do not keep reading after the match. - Defensively skip blank lines and malformed records (lines that do not split into exactly three comma-separated fields). - Hash comparison is case-insensitive on the hex digest (treat `ABC...` and `abc...` as equal). - Return `False` if the user is never found. This mirrors part (a) of the original question: a constant-memory streaming scan with early termination. (Part (b), an offline `user_id -> byte offset` index for sublinear lookups on subsequent runs, is a design extension that trades preprocessing time and a smaller on-disk/in-memory index for O(1)-ish seek-and-read per query; the streaming version below is the directly testable core.)

Constraints

Each non-empty record splits into exactly three comma-separated fields: user_id, salt, password_hash.
password_hash is a hexadecimal SHA-256 digest of (salt + password); comparison is case-insensitive.
Blank lines and malformed records (not exactly three fields) are skipped.
The file is unsorted; user_ids are not guaranteed unique-by-position but the first matching record decides the result.
Only constant extra memory beyond the current line may be used; scanning stops at the first user_id match.

Examples

Input: (['u1,s1,ebd45ded486507e4a30383e322e28c6cfc1be36610b1be98d98f50ce6ae2f823', 'u2,saltA,69a10e54ed3b69ba08830135dca8bb7a24e5be0d372da029534be41e069ab945'], 'u1', 'hunter2')

Expected Output: True

Explanation: SHA-256('s1'+'hunter2') equals the stored hash for u1, so validation succeeds; the scan stops at the first record without reading u2.

Input: (['u1,s1,ebd45ded486507e4a30383e322e28c6cfc1be36610b1be98d98f50ce6ae2f823', 'u2,saltA,69a10e54ed3b69ba08830135dca8bb7a24e5be0d372da029534be41e069ab945'], 'u1', 'wrongpw')

Expected Output: False

Explanation: u1 exists but SHA-256('s1'+'wrongpw') does not match the stored hash, so validation fails.

Input: (['u2,saltA,69a10e54ed3b69ba08830135dca8bb7a24e5be0d372da029534be41e069ab945'], 'u2', 'pw')

Expected Output: True

Explanation: Single record: SHA-256('saltA'+'pw') matches u2's stored hash.

Input: (['u1,s1,ebd45ded486507e4a30383e322e28c6cfc1be36610b1be98d98f50ce6ae2f823'], 'ghost', 'whatever')

Expected Output: False

Explanation: The user 'ghost' never appears in any record, so the function returns False after scanning.

Input: ([], 'u1', 'pw')

Expected Output: False

Explanation: Empty file: no records to scan, so validation returns False.

Input: (['', ' ', 'u3,xyz,92e335e07991a93e6e69e99448921fe70160a03c94f1d0ea43e7a01298390d87'], 'u3', 'correct horse')

Expected Output: True

Explanation: Blank and whitespace-only lines are skipped; the real record for u3 validates with SHA-256('xyz'+'correct horse').

Input: (['badrecord_no_commas', 'u3,xyz,92e335e07991a93e6e69e99448921fe70160a03c94f1d0ea43e7a01298390d87'], 'u3', 'correct horse')

Expected Output: True

Explanation: A malformed line without commas is skipped defensively; the next well-formed record for u3 validates correctly.

Input: (['q,q,2D0010E5DB839F71744C2F2EC9F76922EC7ACCD74CEB7669CB14546F95777BD1'], 'q', 'abc')

Expected Output: True

Explanation: Stored hash is uppercase hex; comparison is case-insensitive, so SHA-256('q'+'abc') still matches.

Hints

Iterate the records one at a time instead of building a dict of the whole file — that keeps memory constant regardless of file size.
Split each line on commas and compare the first field to the target user_id; only hash when the user matches so you do no wasted work.
Recompute the hash as SHA-256(salt + password) and compare hex digests case-insensitively; return immediately on the first matching user_id so you stop reading early.
For part (b), a one-time scan can build a user_id -> byte-offset index so later runs seek directly to the record (sublinear lookups), trading preprocessing time and index storage for faster repeated queries.

Quick Overview

This question evaluates proficiency in on-disk file I/O, streaming algorithms, offline indexing for sublinear lookups, and system-level performance considerations for validating credentials in large files.