PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/ML System Design/Mercor

Build an Ollama Eval Harness

Last updated: Jun 17, 2026

Quick Overview

This question evaluates a candidate's ability to design and implement a lightweight evaluation harness for a locally hosted language model, testing competencies in ML system design, service integration, metrics instrumentation, reliability engineering (timeouts, retries), concurrency control, configuration management, and extensibility.

  • medium
  • Mercor
  • ML System Design
  • Machine Learning Engineer

Build an Ollama Eval Harness

Company: Mercor

Role: Machine Learning Engineer

Category: ML System Design

Difficulty: medium

Interview Round: Technical Screen

Design and implement a lightweight evaluation harness for a language model served locally through Ollama. The harness should: - Read a dataset of evaluation cases, where each case contains an `id`, a `prompt`, and optionally a `reference_answer` or expected label. - Send each prompt to an Ollama-hosted model and capture the model output. - Record metadata such as model name, latency, failures, and retry attempts. - Compute useful evaluation metrics such as exact match for deterministic tasks, pass rate, average latency, and error rate. - Save both per-example results and an aggregate summary report. - Expose configuration for model name, timeout, retries, temperature, and concurrency. - Be structured so that another model backend could be added later with minimal changes. Explain the system design, major components, data flow, and how you would make the harness reliable and extensible.

Quick Answer: This question evaluates a candidate's ability to design and implement a lightweight evaluation harness for a locally hosted language model, testing competencies in ML system design, service integration, metrics instrumentation, reliability engineering (timeouts, retries), concurrency control, configuration management, and extensibility.

Related Interview Questions

  • Build a Candidate Search System - Mercor (medium)
Mercor logo
Mercor
Mar 19, 2026, 12:00 AM
Machine Learning Engineer
Technical Screen
ML System Design
5
0

Design and implement a lightweight evaluation harness for a language model served locally through Ollama.

The harness should:

  • Read a dataset of evaluation cases, where each case contains an id , a prompt , and optionally a reference_answer or expected label.
  • Send each prompt to an Ollama-hosted model and capture the model output.
  • Record metadata such as model name, latency, failures, and retry attempts.
  • Compute useful evaluation metrics such as exact match for deterministic tasks, pass rate, average latency, and error rate.
  • Save both per-example results and an aggregate summary report.
  • Expose configuration for model name, timeout, retries, temperature, and concurrency.
  • Be structured so that another model backend could be added later with minimal changes.

Explain the system design, major components, data flow, and how you would make the harness reliable and extensible.

Solution

Show

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More ML System Design•More Mercor•More Machine Learning Engineer•Mercor Machine Learning Engineer•Mercor ML System Design•Machine Learning Engineer ML System Design
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.