How do I approach System Design interview questions?

System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master system design interviews.

What difficulty level is this interview question?

This is a medium difficulty System Design question, commonly asked during Technical Screen rounds at OpenAI.

What role is this question designed for?

This question is commonly asked for Software Engineer candidates at OpenAI during technical interviews.

Design a Distributed Crossword Solver | OpenAI Interview Question

Design a Distributed Crossword Solver

Company: OpenAI

Role: Software Engineer

Category: System Design

Difficulty: medium

Interview Round: Technical Screen

Design a scalable, distributed service that **solves** crossword-style fill-in puzzles. A request contains a rectangular grid with blocked cells, empty cells, and optional prefilled letters, together with a dictionary version. The service must return one or more valid assignments of dictionary words to **every** horizontal and vertical slot, subject to: - Each non-blocked across or down slot is filled by exactly one dictionary word of the correct length. (A slot is a maximal run of $\ge 2$ adjacent non-blocked cells in a row or column; an isolated single cell is not an entry.) - Prefilled letters are respected. - Crossing words agree on the letter in their shared cell. - **Optional rule:** the same dictionary word may not be reused within a single puzzle. A single-machine, in-memory solver is acceptable for small puzzles, but the production system must scale to **very large dictionaries** (that may not fit on one node) and to puzzles whose search space explodes into many independent subtasks. Your design should cover the public API, dictionary storage and indexing, the core solving algorithm, distributed execution, scalability, fault tolerance, and monitoring. ```hint Framing Try to name the well-known class of problem that "assign one word to every slot subject to the crossing and prefill rules" belongs to. Once you pick the right abstraction, what do the words, the constraints, and the search procedure each map to — and does a standard body of theory then tell you how to attack it? ``` ```hint The hot operation Watch what the search does over and over as it fills slots and a newly placed letter pins a cell in a crossing slot. The repeated question is "which words of length $L$ have letter $c$ at position $p$?" If you answered that with a linear scan of the bucket every time, where would the cost go? What could you precompute, once per dictionary, so that this lookup becomes a cheap combination of a few prepared sets instead of a scan? ``` ```hint Taming the search on one node first Before reaching for infrastructure, make the single-node search itself fail fast. Two levers: which slot should you branch on next so a dead end shows up as early as possible, and how can the letter you just committed shrink the remaining choices in the slots that cross it? Is there any pruning you could do *before* the search even starts, just from the crossing structure? ``` ```hint Distributing the search A backtracking search over slots is one big tree, and you want to turn it into many pieces that workers can chase without talking to each other. How could you carve the top of that tree so each worker inherits a self-contained starting point? Then stress-test independence: when the optional **distinct-word rule** is on, what would two workers otherwise have to agree about, and can you resolve that at the moment you carve the tree so they never need to? ``` ### Constraints & Assumptions Treat these as illustrative scale points to size your indexes and parallelism — not real benchmarks. State your own where they matter. - **Grids:** range from tiny ($5\times5$) to large generic $N\times N$ boards; a board has $O(N^2)$ slots in the worst case. A dense $15\times15$ board has on the order of 60–80 slots. - **Dictionary:** up to $D \approx 10^7$ words across many length buckets; average word length ~8 characters. A dictionary is large but **immutable** once published, and is referenced by an explicit `dictionary_version`. - **Workload:** a mix of small interactive puzzles (sub-second expected) and large puzzles whose exhaustive search can take seconds to minutes. - **Return modes:** callers may want the *first* valid solution, the *top $N$*, or *all* solutions up to a cap — this materially changes the distributed strategy (aggressive cancellation vs. exhaustive search). - **Correctness is non-negotiable:** the service must never return an invalid grid, even under partial failure. ### Clarifying Questions to Ask - Does the caller want the *first* valid solution, the *top $N$*, or *all* solutions up to a cap? - Is the "no word reused in a puzzle" rule on or off by default, and is it per-request configurable? - How large are dictionaries and grids in practice, and what is the latency expectation — interactive or batch? - Are puzzles typically empty grids or partially prefilled? (Prefills prune the search heavily.) - Do dictionaries change in place, or is every published corpus immutable and versioned? - Is determinism / reproducibility for a given `(grid, dictionary_version, options)` a requirement? ### What a Strong Answer Covers - A clean CSP formulation (slots as variables, crossings as constraints) and a parse from raw grid to a slot/crossing model. - A candidate-lookup index design that makes "length + fixed letters" queries fast, plus how it is sharded and cached given a $10^7$-word dictionary. - A concrete search algorithm with real pruning heuristics (variable/value ordering, forward checking, arc consistency) — not just "backtrack." - A decomposition strategy that turns one puzzle into many independent, leasable subtasks, with an honest account of when the distributed path is worth its overhead versus solving in-memory. - Early termination for first / top-$N$ modes and straggler handling for uneven branches. - Fault tolerance (leases, retries, idempotent subtasks), reproducibility (immutable versions), and a final validation gate before any solution is returned. - A monitoring story spanning throughput, search health, distribution health, and correctness. ### Follow-up Questions - How does enabling the distinct-word rule change subtask independence, and how would you keep workers from needing to coordinate? - Branches in the search tree are wildly uneven. How do you keep one worker from grinding on a giant subtree while others sit idle? - A worker dies mid-search and its leased subtask is redelivered to another worker. How do you guarantee you don't produce or store duplicate solutions? - How do you decide, per request, whether to solve inline on one machine or fan the search out across the cluster?

Quick Answer: This question evaluates competency in distributed systems design, scalable indexing and storage, parallel search and constraint-satisfaction reasoning, and system-level concerns such as API design, fault tolerance, and observability.

Design a scalable, distributed service that solves crossword-style fill-in puzzles.

A request contains a rectangular grid with blocked cells, empty cells, and optional prefilled letters, together with a dictionary version. The service must return one or more valid assignments of dictionary words to every horizontal and vertical slot, subject to:

Each non-blocked across or down slot is filled by exactly one dictionary word of the correct length. (A slot is a maximal run of $\ge 2$ adjacent non-blocked cells in a row or column; an isolated single cell is not an entry.)
Prefilled letters are respected.
Crossing words agree on the letter in their shared cell.
Optional rule: the same dictionary word may not be reused within a single puzzle.

A single-machine, in-memory solver is acceptable for small puzzles, but the production system must scale to very large dictionaries (that may not fit on one node) and to puzzles whose search space explodes into many independent subtasks. Your design should cover the public API, dictionary storage and indexing, the core solving algorithm, distributed execution, scalability, fault tolerance, and monitoring.

Constraints & Assumptions

Treat these as illustrative scale points to size your indexes and parallelism — not real benchmarks. State your own where they matter.

Grids: range from tiny ( $5\times5$ ) to large generic $N\times N$ boards; a board has $O(N^2)$ slots in the worst case. A dense $15\times15$ board has on the order of 60–80 slots.
Dictionary: up to $D \approx 10^7$ words across many length buckets; average word length ~8 characters. A dictionary is large but immutable once published, and is referenced by an explicit dictionary_version .
Workload: a mix of small interactive puzzles (sub-second expected) and large puzzles whose exhaustive search can take seconds to minutes.
Return modes: callers may want the first valid solution, the top $N$ , or all solutions up to a cap — this materially changes the distributed strategy (aggressive cancellation vs. exhaustive search).
Correctness is non-negotiable: the service must never return an invalid grid, even under partial failure.

Clarifying Questions to Ask

Does the caller want the first valid solution, the top $N$ , or all solutions up to a cap?
Is the "no word reused in a puzzle" rule on or off by default, and is it per-request configurable?
How large are dictionaries and grids in practice, and what is the latency expectation — interactive or batch?
Are puzzles typically empty grids or partially prefilled? (Prefills prune the search heavily.)
Do dictionaries change in place, or is every published corpus immutable and versioned?
Is determinism / reproducibility for a given (grid, dictionary_version, options) a requirement?

What a Strong Answer Covers

A clean CSP formulation (slots as variables, crossings as constraints) and a parse from raw grid to a slot/crossing model.
A candidate-lookup index design that makes "length + fixed letters" queries fast, plus how it is sharded and cached given a $10^7$ -word dictionary.
A concrete search algorithm with real pruning heuristics (variable/value ordering, forward checking, arc consistency) — not just "backtrack."
A decomposition strategy that turns one puzzle into many independent, leasable subtasks, with an honest account of when the distributed path is worth its overhead versus solving in-memory.
Early termination for first / top- $N$ modes and straggler handling for uneven branches.
Fault tolerance (leases, retries, idempotent subtasks), reproducibility (immutable versions), and a final validation gate before any solution is returned.
A monitoring story spanning throughput, search health, distribution health, and correctness.

Follow-up Questions

How does enabling the distinct-word rule change subtask independence, and how would you keep workers from needing to coordinate?
Branches in the search tree are wildly uneven. How do you keep one worker from grinding on a giant subtree while others sit idle?
A worker dies mid-search and its leased subtask is redelivered to another worker. How do you guarantee you don't produce or store duplicate solutions?
How do you decide, per request, whether to solve inline on one machine or fan the search out across the cluster?

Design a Distributed Crossword Solver

Company: OpenAI

Role: Software Engineer

Category: System Design

Difficulty: medium

Interview Round: Technical Screen

Design a Distributed Crossword Solver

Quick Overview

Constraints & Assumptions

Clarifying Questions to Ask

What a Strong Answer Covers

Follow-up Questions

Solution

Submit Your Answer to Earn 20XP

Design a Distributed Crossword Solver

Quick Overview

Constraints & Assumptions

Clarifying Questions to Ask

What a Strong Answer Covers

Follow-up Questions

Solution

Submit Your Answer to Earn 20XP