PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/ML System Design/Anthropic

Design an LLM-based binary classifier

Last updated: Mar 29, 2026

Quick Overview

This question evaluates knowledge of LLM-based binary classification, probabilistic scoring and numerical stability (including log-prob handling and log-sum-exp), prompt construction, batching and sequence-level aggregation, calibration and evaluation metrics, plus analysis of failure modes and inference-time complexity.

  • medium
  • Anthropic
  • ML System Design
  • Software Engineer

Design an LLM-based binary classifier

Company: Anthropic

Role: Software Engineer

Category: ML System Design

Difficulty: medium

Interview Round: Technical Screen

You are given a helper function score_batch(prompt, inputs) that, for each input string, returns per-token probabilities (possibly as log-probabilities) from a large language model under the provided system prompt. Design a binary text classifier using only this helper: ( 1) Construct system prompts for classes A and B. ( 2) For each input, query the helper with both prompts and produce a continuous score s in [0,1] estimating P(class=A|x), not just a hard label. ( 3) If the helper returns log-probabilities, show how to compute a numerically stable score (e.g., via log-sum-exp and log-prob normalization). ( 4) Define batching and how token-level probabilities are aggregated into a single sequence-level score. ( 5) Describe threshold selection and evaluation metrics (ROC-AUC, precision/recall, F 1), including calibration. ( 6) Propose performance improvements: prompt engineering, repeated sampling, temperature control, prompt ensembling, score calibration, and handling class imbalance. ( 7) Discuss failure modes and how to validate/iterate offline without fine-tuning. Provide pseudocode and the inference-time complexity.

Quick Answer: This question evaluates knowledge of LLM-based binary classification, probabilistic scoring and numerical stability (including log-prob handling and log-sum-exp), prompt construction, batching and sequence-level aggregation, calibration and evaluation metrics, plus analysis of failure modes and inference-time complexity.

Related Interview Questions

  • Design GPU inference request batching - Anthropic
  • How do you handle an LLM agents interview? - Anthropic (hard)
  • Design a prompt playground - Anthropic (medium)
  • Design a model downloader - Anthropic (medium)
  • Design a high-concurrency LLM inference service - Anthropic (hard)
Anthropic logo
Anthropic
Sep 6, 2025, 12:00 AM
Software Engineer
Technical Screen
ML System Design
19
0

Design a Binary Text Classifier Using Only a Log-Probability Scoring Helper

Context

You are building a binary text classifier without fine-tuning. You have access only to a helper function:

  • score_batch(prompt, inputs) → for each input string x in inputs, returns per-token probabilities (or log-probabilities) of x under the provided system prompt.

Assume:

  • The same input x can be scored under different system prompts.
  • The helper returns either probabilities or log-probabilities (you must handle both cases robustly).
  • You can process inputs in batches.
  • If the API exposes temperature or sampling controls, you may optionally adjust them.

Tasks

  1. Construct system prompts for class A and class B.
  2. For each input, query the helper with both prompts and produce a continuous score s in [0, 1] estimating P(class = A | x), not just a hard label.
  3. If the helper returns log-probabilities, show how to compute a numerically stable score using log-sum-exp and log-prob normalization.
  4. Define batching and how token-level probabilities are aggregated into a single sequence-level score.
  5. Describe threshold selection and evaluation metrics (ROC-AUC, precision/recall, F1), including calibration.
  6. Propose performance improvements: prompt engineering, repeated sampling, temperature control, prompt ensembling, score calibration, and handling class imbalance.
  7. Discuss failure modes and how to validate/iterate offline without fine-tuning. Provide pseudocode and the inference-time complexity.

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More ML System Design•More Anthropic•More Software Engineer•Anthropic Software Engineer•Anthropic ML System Design•Software Engineer ML System Design
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.