You are discussing a language-modeling / NLP project. The interviewer asks about surprisal.
-
Define
surprisal
for an event/token with probability
p
.
-
What are the possible
units
of surprisal (analogous to meters for distance), and what determines the unit?
-
How is surprisal related to
cross-entropy
and
perplexity
in language modeling?
-
What common pitfalls or misunderstandings arise when interpreting surprisal values?