You are given a DataFrame df where each row summarizes a player’s performance until their first loss.
Input
df columns:
-
player_id
(string/int)
-
wins_before_first_loss
(int,
k≥0
): number of consecutive wins observed before the first loss occurred
Interpretation: for each player, you observed a sequence of games that ended with a loss, e.g., WWW...WL, and wins_before_first_loss = k.
Questions
-
Compute the
expected number of additional wins before the next loss
for each player.
-
You may assume each player has an underlying win probability
pi
that is constant across games.
-
Show how you would compute this expectation from the observed data (code/pseudocode is fine).
-
Propose a probabilistic model for
wins_before_first_loss
and explain how you’d estimate parameters.
-
Given a plot of the computed expectations across players (distribution/shape), interpret what it suggests (heterogeneity, outliers, model misfit).
-
How would you evaluate the model and use it to make a decision (e.g., ranking players, allocating resources, setting thresholds)?