How do I approach ML System Design interview questions?

ML System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master ml system design interviews.

What difficulty level is this interview question?

This is a medium difficulty ML System Design question, commonly asked during Technical Screen rounds at Citadel.

What role is this question designed for?

This question is commonly asked for Data Scientist candidates at Citadel during technical interviews.

Stabilize LLM inference and estimate needed repeats

Last updated: Mar 29, 2026

Quick Overview

This question evaluates skills in designing reliable LLM inference pipelines and in statistical modeling of stochastic outputs, including reproducibility engineering, uncertainty quantification, and the use of correlation metrics (e.g., Pearson) to measure stability.

|Home/ML System Design/Citadel

Stabilize LLM inference and estimate needed repeats

Citadel

Oct 9, 2025, 12:00 AM

mediumData ScientistTechnical ScreenML System Design

You run an LLM-based sentiment model to score a fixed dataset of texts. Because the inference API doesn’t let you set temperature (and outputs are stochastic), the model produces slightly different score vectors on different days.

Day 1 inference output is a vector $y_1$ (one score per item).
Day 2 inference output is $y_2$ .
The observed Pearson correlation is $\mathrm{corr}(y_1, y_2) = 0.95$ .

Tasks:

System/ML design: How would you make inference outputs more reproducible (or at least stable) in production given limited decoding controls?
Modeling question: Propose a reasonable statistical model for this randomness and derive how many independent inference runs (e.g., days) you’d need to aggregate so that the correlation between aggregated outputs from two independent aggregations exceeds 0.99 (state assumptions clearly).

Submit Your Answer to Earn 20XP

Loading comments...

Browse More Questions

More ML System Design•More Citadel•More Data Scientist•Citadel Data Scientist•Citadel ML System Design•Data Scientist ML System Design

Your design canvas — auto-saved

Stabilize LLM inference and estimate needed repeats

Last updated: Mar 29, 2026

Quick Overview

|Home/ML System Design/Citadel

Stabilize LLM inference and estimate needed repeats

Citadel

Oct 9, 2025, 12:00 AM

mediumData ScientistTechnical ScreenML System Design

Day 1 inference output is a vector $y_1$ (one score per item).
Day 2 inference output is $y_2$ .
The observed Pearson correlation is $\mathrm{corr}(y_1, y_2) = 0.95$ .

Tasks:

System/ML design: How would you make inference outputs more reproducible (or at least stable) in production given limited decoding controls?
Modeling question: Propose a reasonable statistical model for this randomness and derive how many independent inference runs (e.g., days) you’d need to aggregate so that the correlation between aggregated outputs from two independent aggregations exceeds 0.99 (state assumptions clearly).

Submit Your Answer to Earn 20XP

Loading comments...

Browse More Questions

More ML System Design•More Citadel•More Data Scientist•Citadel Data Scientist•Citadel ML System Design•Data Scientist ML System Design

Your design canvas — auto-saved