How do I approach ML System Design interview questions?

ML System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master ml system design interviews.

What difficulty level is this interview question?

This is a medium difficulty ML System Design question, commonly asked during Technical Screen rounds at Anthropic.

What role is this question designed for?

This question is commonly asked for Software Engineer candidates at Anthropic during technical interviews.

Design an LLM-based binary classifier | Anthropic Interview Question

Quick Overview

This question evaluates knowledge of LLM-based binary classification, probabilistic scoring and numerical stability (including log-prob handling and log-sum-exp), prompt construction, batching and sequence-level aggregation, calibration and evaluation metrics, plus analysis of failure modes and inference-time complexity.

Design a Binary Text Classifier Using Only a Log-Probability Scoring Helper

Context

You are building a binary text classifier without fine-tuning. You have access only to a helper function:

score_batch(prompt, inputs) → for each input string x in inputs, returns per-token probabilities (or log-probabilities) of x under the provided system prompt.

Assume:

The same input x can be scored under different system prompts.
The helper returns either probabilities or log-probabilities (you must handle both cases robustly).
You can process inputs in batches.
If the API exposes temperature or sampling controls, you may optionally adjust them.

Tasks

Construct system prompts for class A and class B.
For each input, query the helper with both prompts and produce a continuous score s in [0, 1] estimating P(class = A | x), not just a hard label.
If the helper returns log-probabilities, show how to compute a numerically stable score using log-sum-exp and log-prob normalization.
Define batching and how token-level probabilities are aggregated into a single sequence-level score.
Describe threshold selection and evaluation metrics (ROC-AUC, precision/recall, F1), including calibration.
Propose performance improvements: prompt engineering, repeated sampling, temperature control, prompt ensembling, score calibration, and handling class imbalance.
Discuss failure modes and how to validate/iterate offline without fine-tuning. Provide pseudocode and the inference-time complexity.

Quick Overview

Context

You are building a binary text classifier without fine-tuning. You have access only to a helper function:

score_batch(prompt, inputs) → for each input string x in inputs, returns per-token probabilities (or log-probabilities) of x under the provided system prompt.

Assume:

The same input x can be scored under different system prompts.

The helper returns either probabilities or log-probabilities (you must handle both cases robustly).

You can process inputs in batches.

If the API exposes temperature or sampling controls, you may optionally adjust them.

Tasks

Construct system prompts for class A and class B.

For each input, query the helper with both prompts and produce a continuous score s in [0, 1] estimating P(class = A | x), not just a hard label.

If the helper returns log-probabilities, show how to compute a numerically stable score using log-sum-exp and log-prob normalization.

Define batching and how token-level probabilities are aggregated into a single sequence-level score.

Describe threshold selection and evaluation metrics (ROC-AUC, precision/recall, F1), including calibration.

Propose performance improvements: prompt engineering, repeated sampling, temperature control, prompt ensembling, score calibration, and handling class imbalance.

Discuss failure modes and how to validate/iterate offline without fine-tuning. Provide pseudocode and the inference-time complexity.

Design an LLM-based binary classifier

Quick Overview

Design a Binary Text Classifier Using Only a Log-Probability Scoring Helper

Context

Tasks

Solution

Comments (0)

Design an LLM-based binary classifier

Quick Overview

Design a Binary Text Classifier Using Only a Log-Probability Scoring Helper

Context

Tasks

Solution

Comments (0)