Stabilize LLM inference and estimate needed repeats

Q: Stabilize LLM inference and estimate needed repeats

This is a ML System Design interview question from Citadel for Data Scientist roles. View the full question and solution on PracHub.

Q: How do I approach ML System Design interview questions?

ML System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master ml system design interviews.

Question

Loading...

You run an LLM-based sentiment model to score a fixed dataset of texts. Because the inference API doesn’t let you set temperature (and outputs are stochastic), the model produces slightly different score vectors on different days.

Day 1 inference output is a vector $y_1$ (one score per item).
Day 2 inference output is $y_2$ .
The observed Pearson correlation is $\mathrm{corr}(y_1, y_2) = 0.95$ .

Tasks:

System/ML design: How would you make inference outputs more reproducible (or at least stable) in production given limited decoding controls?
Modeling question: Propose a reasonable statistical model for this randomness and derive how many independent inference runs (e.g., days) you’d need to aggregate so that the correlation between aggregated outputs from two independent aggregations exceeds 0.99 (state assumptions clearly).

Stabilize LLM inference and estimate needed repeats

Solution

Comments (0)