You are asked to design a backend service that can solve crossword puzzles at scale.
A single puzzle instance consists of:
-
A crossword
board
(grid) with numbered Across/Down slots and black squares.
-
A
dictionary
of valid words, each with a short definition, for example:
-
TIRE
: "A wheel on a car"
-
RATS
: "Small rodents"
-
BATH
: "A tub to clean in"
-
BE
: "To, or not to"
The system should fill the crossword board with valid words from the dictionary that satisfy all crossing constraints. The final output for a puzzle is a list of filled clues, for example:
-
1 Down: A wheel on a car
-
2 Across: Small rodents
-
3 Across: To, or not to
-
3 Down: A tub to clean in
You can assume there exists some backtracking / constraint-satisfaction algorithm that, given a board and dictionary, can search the solution space and determine one or more valid fillings. However, for large boards this search becomes extremely expensive, and a single machine/core doing a naive brute-force search will not finish in acceptable time.
Design a distributed service that can solve very large crossword puzzles by splitting the search work across many worker machines. In particular, address:
-
API and basic flow
-
How clients submit a puzzle (board + dictionary) and retrieve results.
-
Whether solving should be synchronous or asynchronous.
-
High-level architecture
-
Main components: API layer, coordinator/orchestrator, task queues, worker services, storage.
-
Where the crossword-solving algorithm runs and how it interacts with the rest of the system.
-
Job and task scheduling
-
How you break down a single large puzzle-solving job into smaller
tasks
that can be executed in parallel.
-
How tasks represent parts of the search space (e.g., partial board assignments, subsets of words, subsets of constraints).
-
How workers pick up, execute, and generate new tasks.
-
Dynamic load balancing and task splitting
-
Some tasks will quickly hit a dead end in the search and finish almost immediately.
-
Other tasks may explode into a huge search subtree and run for a long time.
-
Describe how you:
-
Avoid having some machines idle while others are stuck on huge tasks.
-
Dynamically split heavy tasks into smaller ones and redistribute them.
-
Potentially use work stealing or other techniques to balance the load.
-
Scalability and performance
-
How you scale to many concurrent puzzles and very large boards.
-
How you choose task granularity to balance overhead versus parallelism.
-
Fault tolerance and correctness
-
How you handle worker failures, retries, and ensure that tasks are not lost.
-
How you avoid or handle duplicate task execution.
-
How you detect that a puzzle is fully solved or unsolvable and finalize the job.
-
Data and state management
-
Where you store puzzle definitions, dictionaries, intermediate states/partial solutions, and final solutions.
-
How you track the progress of each puzzle-solving job.
Clearly explain your design assumptions, trade-offs, and any heuristics you would use to make this practical in production. Focus on the distributed job scheduling and task management aspects rather than the detailed crossword-solving algorithm itself.