OpenAI Machine Learning Engineer Interview Questions
Master your tech interview with our curated database of real questions from top companies.
Find earliest supporting version under constraints
You are given version strings formatted as {major}.{minor}.{patch}, e.g., "103.003.03". Each version either supports a feature or not. You may call is...
Explain what torch.distributed.barrier does
Question In PyTorch distributed training, what does torch.distributed.barrier() do? Follow-ups - Give an example of when you would use it. - What are ...
Design an AWS fine-tuning platform for LLMs
Scenario You need to build a system that lets customers fine-tune their own large language model (LLM) on AWS. Task Design a managed platform where us...
Design and optimize a RAG system
Scenario You are building a Retrieval-Augmented Generation (RAG) system for question answering over an internal document corpus. Task Design the end-t...
Design a recommendation system end-to-end
Question Design a large-scale recommendation system (e.g., short videos or e-commerce items). Requirements - Personalized feed ranking for hundreds of...
Design a search query autocomplete system
Question Design a search autocomplete system that suggests completions as the user types. Requirements - Sub-100ms latency per keystroke. - Suggestion...
Design an image/video near-duplicate detection system
Question Design a system to detect near-duplicate images/videos (e.g., reuploads, minor edits, different encodes) at large scale. Requirements - Suppo...
Design a harmful video content moderation system
Question Design an end-to-end system to detect and moderate harmful videos on a large platform. Requirements - Detect multiple policy categories (viol...
Design a regional surge pricing strategy
Scenario You operate a ride-hailing platform. You need to design a system that sets surge multipliers (dynamic pricing) for a given region. Task Desig...
Select high-quality math documents from crawls
Scenario You have a web crawler that collects raw HTML/PDF documents. You want to build a pipeline that identifies high-quality math documents suitabl...
Design a chatbot fallback for unknown questions
Scenario You run a ChatGPT-like assistant. Users sometimes ask questions the model cannot answer reliably (unknown/uncertain/needs up-to-date facts). ...
Design an OOD detection system
Prompt You are building a product that uses an ML classifier in production (e.g., for routing, ranking, safety, fraud, or categorization). Over time, ...
Compute time to infect all cells
You are given an n × m grid representing people in a city. - Each cell is either infected (1) or healthy (0). - Two cells are neighbors if they share ...
Explain motivation and mission alignment
In a behavioral interview for a mission-driven tech company, you are asked two related questions: 1. Why do you want to join this company? 2. How do...
Design an enterprise RAG system
System Design Task: Retrieval-Augmented Generation (RAG) for Enterprise Users You are designing a multi-tenant enterprise RAG system that answers user...
Design an ML search system
Design an ML‑Powered Enterprise Document Search System Context You are designing a multi‑tenant enterprise search system that indexes documents from m...
Train a classifier and analyze dataset
End-to-End Binary Classifier Workflow (EDA → Modeling → Fairness → Report) You are given a labeled tabular dataset and asked to implement a reproducib...
Debug a transformer training pipeline
Diagnose a Diverging PyTorch Transformer Training Run You are given a PyTorch Transformer training pipeline whose loss diverges and validation accurac...
Diagnose Transformer training and inference bugs
Debugging a Transformer That Intermittently Throws Shape/Type Errors and Fails to Converge You are given a Transformer-based sequence model that: - In...
Derive MLE and Bayesian posterior for Bernoulli
Bernoulli/Binomial Inference Task You observe n independent Bernoulli trials with unknown success probability p, and you record k successes (so K ~ Bi...