Design a distributed crossword solving service

Q: Design a distributed crossword solving service

This is a technical interview question from OpenAI for Software Engineer roles. View the full question and solution on PracHub.

Question

You are asked to design a backend service that can solve crossword puzzles at scale.

A single puzzle instance consists of:

A crossword board (grid) with numbered Across/Down slots and black squares.
A dictionary of valid words, each with a short definition, for example:
- TIRE : "A wheel on a car"
- RATS : "Small rodents"
- BATH : "A tub to clean in"
- BE : "To, or not to"

The system should fill the crossword board with valid words from the dictionary that satisfy all crossing constraints. The final output for a puzzle is a list of filled clues, for example:

1 Down: A wheel on a car
2 Across: Small rodents
3 Across: To, or not to
3 Down: A tub to clean in

You can assume there exists some backtracking / constraint-satisfaction algorithm that, given a board and dictionary, can search the solution space and determine one or more valid fillings. However, for large boards this search becomes extremely expensive, and a single machine/core doing a naive brute-force search will not finish in acceptable time.

Design a distributed service that can solve very large crossword puzzles by splitting the search work across many worker machines. In particular, address:

API and basic flow
- How clients submit a puzzle (board + dictionary) and retrieve results.
- Whether solving should be synchronous or asynchronous.
High-level architecture
- Main components: API layer, coordinator/orchestrator, task queues, worker services, storage.
- Where the crossword-solving algorithm runs and how it interacts with the rest of the system.
Job and task scheduling
- How you break down a single large puzzle-solving job into smaller tasks that can be executed in parallel.
- How tasks represent parts of the search space (e.g., partial board assignments, subsets of words, subsets of constraints).
- How workers pick up, execute, and generate new tasks.
Dynamic load balancing and task splitting
- Some tasks will quickly hit a dead end in the search and finish almost immediately.
- Other tasks may explode into a huge search subtree and run for a long time.
- Describe how you:
  - Avoid having some machines idle while others are stuck on huge tasks.
  - Dynamically split heavy tasks into smaller ones and redistribute them.
  - Potentially use work stealing or other techniques to balance the load.
Scalability and performance
- How you scale to many concurrent puzzles and very large boards.
- How you choose task granularity to balance overhead versus parallelism.
Fault tolerance and correctness
- How you handle worker failures, retries, and ensure that tasks are not lost.
- How you avoid or handle duplicate task execution.
- How you detect that a puzzle is fully solved or unsolvable and finalize the job.
Data and state management
- Where you store puzzle definitions, dictionaries, intermediate states/partial solutions, and final solutions.
- How you track the progress of each puzzle-solving job.

Clearly explain your design assumptions, trade-offs, and any heuristics you would use to make this practical in production. Focus on the distributed job scheduling and task management aspects rather than the detailed crossword-solving algorithm itself.

Design a distributed crossword solving service

Solution

Comments (0)