PracHub
QuestionsCoachesLearningGuidesInterview Prep

Quick Overview

This question evaluates proficiency in on-disk file I/O, streaming algorithms, offline indexing for sublinear lookups, and system-level performance considerations for validating credentials in large files.

  • Medium
  • Instacart
  • Coding & Algorithms
  • Software Engineer

Optimize file reads for password validation

Company: Instacart

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: Medium

Interview Round: HR Screen

Given a multi-gigabyte credentials file on disk with one record per line in the format user_id,salt,password_hash (hex), implement validate(user_id, password) -> bool. The file is unsorted and cannot be fully loaded into memory. Optimize space by only reading the content needed. Propose and implement: (a) a streaming approach that scans lines with constant memory and stops early when the user is found; (b) an optional offline indexing strategy (e.g., build a user_id -> byte offset index) to enable sublinear lookups on subsequent runs. Discuss trade-offs, file encoding/error handling, memory-mapped I/O vs buffered reads, and time/space complexity.

Quick Answer: This question evaluates proficiency in on-disk file I/O, streaming algorithms, offline indexing for sublinear lookups, and system-level performance considerations for validating credentials in large files.

You are given a large credentials file as a list of lines (simulating a streamed read of a multi-gigabyte file that cannot be loaded into memory). Each non-empty line is a record in the format `user_id,salt,password_hash`, where `password_hash` is the lowercase hexadecimal SHA-256 digest of the string `salt + password`. Implement `validate(lines, user_id, password) -> bool` that scans the records and returns `True` if and only if a record exists for `user_id` and the supplied `password`, when hashed with that record's `salt`, equals the stored `password_hash`. Requirements: - Process records one at a time and use only constant extra memory (do not materialize the whole file or build large structures). - Stop scanning as soon as the target `user_id` is found (early exit) — do not keep reading after the match. - Defensively skip blank lines and malformed records (lines that do not split into exactly three comma-separated fields). - Hash comparison is case-insensitive on the hex digest (treat `ABC...` and `abc...` as equal). - Return `False` if the user is never found. This mirrors part (a) of the original question: a constant-memory streaming scan with early termination. (Part (b), an offline `user_id -> byte offset` index for sublinear lookups on subsequent runs, is a design extension that trades preprocessing time and a smaller on-disk/in-memory index for O(1)-ish seek-and-read per query; the streaming version below is the directly testable core.)

Constraints

  • Each non-empty record splits into exactly three comma-separated fields: user_id, salt, password_hash.
  • password_hash is a hexadecimal SHA-256 digest of (salt + password); comparison is case-insensitive.
  • Blank lines and malformed records (not exactly three fields) are skipped.
  • The file is unsorted; user_ids are not guaranteed unique-by-position but the first matching record decides the result.
  • Only constant extra memory beyond the current line may be used; scanning stops at the first user_id match.

Examples

Input: (['u1,s1,ebd45ded486507e4a30383e322e28c6cfc1be36610b1be98d98f50ce6ae2f823', 'u2,saltA,69a10e54ed3b69ba08830135dca8bb7a24e5be0d372da029534be41e069ab945'], 'u1', 'hunter2')

Expected Output: True

Explanation: SHA-256('s1'+'hunter2') equals the stored hash for u1, so validation succeeds; the scan stops at the first record without reading u2.

Input: (['u1,s1,ebd45ded486507e4a30383e322e28c6cfc1be36610b1be98d98f50ce6ae2f823', 'u2,saltA,69a10e54ed3b69ba08830135dca8bb7a24e5be0d372da029534be41e069ab945'], 'u1', 'wrongpw')

Expected Output: False

Explanation: u1 exists but SHA-256('s1'+'wrongpw') does not match the stored hash, so validation fails.

Input: (['u2,saltA,69a10e54ed3b69ba08830135dca8bb7a24e5be0d372da029534be41e069ab945'], 'u2', 'pw')

Expected Output: True

Explanation: Single record: SHA-256('saltA'+'pw') matches u2's stored hash.

Input: (['u1,s1,ebd45ded486507e4a30383e322e28c6cfc1be36610b1be98d98f50ce6ae2f823'], 'ghost', 'whatever')

Expected Output: False

Explanation: The user 'ghost' never appears in any record, so the function returns False after scanning.

Input: ([], 'u1', 'pw')

Expected Output: False

Explanation: Empty file: no records to scan, so validation returns False.

Input: (['', ' ', 'u3,xyz,92e335e07991a93e6e69e99448921fe70160a03c94f1d0ea43e7a01298390d87'], 'u3', 'correct horse')

Expected Output: True

Explanation: Blank and whitespace-only lines are skipped; the real record for u3 validates with SHA-256('xyz'+'correct horse').

Input: (['badrecord_no_commas', 'u3,xyz,92e335e07991a93e6e69e99448921fe70160a03c94f1d0ea43e7a01298390d87'], 'u3', 'correct horse')

Expected Output: True

Explanation: A malformed line without commas is skipped defensively; the next well-formed record for u3 validates correctly.

Input: (['q,q,2D0010E5DB839F71744C2F2EC9F76922EC7ACCD74CEB7669CB14546F95777BD1'], 'q', 'abc')

Expected Output: True

Explanation: Stored hash is uppercase hex; comparison is case-insensitive, so SHA-256('q'+'abc') still matches.

Hints

  1. Iterate the records one at a time instead of building a dict of the whole file — that keeps memory constant regardless of file size.
  2. Split each line on commas and compare the first field to the target user_id; only hash when the user matches so you do no wasted work.
  3. Recompute the hash as SHA-256(salt + password) and compare hex digests case-insensitively; return immediately on the first matching user_id so you stop reading early.
  4. For part (b), a one-time scan can build a user_id -> byte-offset index so later runs seek directly to the record (sublinear lookups), trading preprocessing time and index storage for faster repeated queries.
Last updated: Jun 25, 2026

Loading coding console...

PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • AI Coding Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Fix the Broken Search Filter in a Book Catalog API - Instacart (medium)
  • Find Largest Adjacent Stock Price Change - Instacart (medium)
  • Implement an In-Memory File Storage System - Instacart (medium)
  • Decode an encoded string - Instacart (medium)
  • Evaluate an arithmetic expression - Instacart (medium)