Design sequential reveal classification and policy
Company: Jane Street
Role: Software Engineer
Category: ML System Design
Difficulty: hard
Interview Round: Technical Screen
You are given a trained CNN for FashionMNIST and an evaluation notebook.
1) Implement a row-wise reveal evaluation: at step k, the top k rows are visible and the remaining rows are replaced by a fixed mask value m; record the model’s prediction for each k and compute accuracy versus k across the test set. Plot accuracy vs k and explain what the curve tells you about information sufficiency and robustness.
2) Define a reward R for partially revealed images: if the model’s final prediction is correct when you stop, R equals the number of pixels still masked (not yet revealed); otherwise R = 0. Using the accuracy–k results, propose and implement a method to pick a single global mask fill value m that maximizes expected reward over the dataset (e.g., sweep candidate m values, estimate expected reward for each, and select the best). Discuss trade-offs such as class imbalance and distribution shift from masking.
3) Improve the expected reward via training-time augmentation that masks contiguous rows/blocks so the model learns to be accurate with limited visible pixels. Specify the augmentation policy (probability, region size range, fill value), how you would constrain randomness to avoid degenerate cases (e.g., masking almost all pixels), and how you would tune the policy. If limited to only two retraining runs, state the exact two configurations you would try and the metrics you would compare (accuracy-vs-k and expected reward).
4) With the trained model fixed and pixels revealed sequentially at test time (1 pixel, 2 pixels, … full image), design an early-exit policy that decides when to output to maximize expected reward R. Propose a concrete strategy such as requiring the argmax class to be stable within a sliding window of the last W steps and/or exceed a confidence threshold; describe how to set W and thresholds via offline calibration, and how to handle ties or oscillations.
Quick Answer: This question evaluates understanding of sequential partial-observation evaluation, mask-value selection and reward optimization, augmentation strategies for robustness, and early-exit policy design for classifiers under progressively revealed inputs, testing competencies in model calibration, evaluation metrics, distribution-shift reasoning, and trade-off analysis in the ML System Design domain at both conceptual and practical application levels. It is commonly asked because it probes system-level thinking about information sufficiency, metric-driven trade-offs between accuracy and masked information, robustness to masking and augmentation, and the ability to design and calibrate stopping and confidence policies for streaming or cost-sensitive inference pipelines.