Problem
You are playing a simplified 5-letter word guessing game.
-
There is a hidden
secret
word of length
5
.
-
You have a
dictionary
of valid 5-letter words. Each word may also have an associated
frequency/weight
representing how likely it is to be the secret.
-
When you
guess
a 5-letter word, the game returns
only “green” information
(correct letter in the correct position). There is
no
information about letters that exist in the word but are in the wrong position (“yellow”), and no explicit info about letters that are absent.
Feedback definition
For a guess g and secret s, define the feedback as a 5-bit mask m where:
-
m[i] = 1
iff
g[i] == s[i]
-
otherwise
m[i] = 0
Example: g="crane", s="cared" → mask is 1 0 0 0 0 (only position 0 matches).
Task
Given:
-
words
: a list of
N
valid 5-letter words
-
weight[w]
(optional): a positive number for each word (if omitted, assume uniform)
-
candidates
: the current set of possible secret words consistent with feedback so far (initially all
words
)
Write an algorithm/function to choose the next guess that is optimal under the following objective:
-
Choose a guess word
g
(typically from
words
) that
minimizes the expected size
of the remaining candidate set after receiving the green-only mask, where the expectation is taken over the secret word drawn from
candidates
according to
weight
.
Formally, if C is the current candidate set and C_m(g) is the subset of candidates that would produce mask m for guess g, choose g that minimizes:
E[∣Cmask(g)∣]=∑m∈{0,1}5P(mask=m)⋅∣Cm(g)∣
where P(mask=m) is induced by the weighted distribution over secrets in C.
Input/Output expectations
-
Input sizes:
1 ≤ N ≤ 5,000
(you may propose optimizations if needed), word length is always 5.
-
Output: the selected next guess word.
Follow-ups (optional)
-
How would you incorporate non-uniform word frequencies?
-
How would you scale if
N
is very large (e.g., 100k)?
-
How would you change the objective to minimize expected
number of guesses
instead of expected remaining candidates?