PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep

Quick Overview

This question evaluates competency in interacting with RESTful HTTP APIs, JSON parsing and filtering, automation of operational workflows, resilient error handling (including retries and malformed responses), and command-line configuration for production-like services.

  • medium
  • Coreweave
  • Coding & Algorithms
  • Site Reliability Engineer

Query Machines and Mark Them Offline

Company: Coreweave

Role: Site Reliability Engineer

Category: Coding & Algorithms

Difficulty: medium

Interview Round: Onsite

You are given access to a local HTTP server that simulates a production machine-management service. Write a program that queries the service, identifies the machines that match the criteria returned by the API, and marks those machines offline. Assume the server exposes endpoints similar to the following: - `GET /task` returns a JSON object describing what to look for, for example: ```json { "rack": "rack-17", "machine_type": "gpu", "minimum_error_count": 3 } ``` - `GET /machines` returns a JSON array of machines, for example: ```json [ {"id": "m1", "rack": "rack-17", "machine_type": "gpu", "error_count": 5, "state": "online"}, {"id": "m2", "rack": "rack-17", "machine_type": "cpu", "error_count": 4, "state": "online"} ] ``` - `POST /machines/{id}/offline` marks a machine offline. The request should be sent only for machines that satisfy the task criteria and are currently online. Implement a command-line program that: 1. Reads the base URL of the HTTP server from an argument or configuration value. 2. Calls `GET /task` to discover the filtering criteria. 3. Calls `GET /machines` to retrieve all machines. 4. Finds every online machine matching the criteria. 5. Sends `POST /machines/{id}/offline` for each matching machine. 6. Handles common production issues such as non-200 responses, malformed JSON, empty result sets, retries for transient failures, and clear logging or error messages.

Quick Answer: This question evaluates competency in interacting with RESTful HTTP APIs, JSON parsing and filtering, automation of operational workflows, resilient error handling (including retries and malformed responses), and command-line configuration for production-like services.

Because an online judge cannot call a real HTTP server, this problem simulates the machine-management service. Implement the core logic of the command-line tool as a function. You are given the response from `GET /task`, the response from `GET /machines`, and the sequence of status codes that each `POST /machines/{id}/offline` would return. Parse the JSON, find every machine that matches the task (`rack` matches, `machine_type` matches, `error_count` is at least `minimum_error_count`, and `state` is `online`), and try to mark each one offline. A POST succeeds on status `200` or `204`. Status codes `500`, `502`, `503`, and `504` are transient and should be retried up to `max_retries` times after the first attempt. Any other non-success status is a permanent failure and must not be retried. If either GET returns a non-200 status or malformed JSON, stop immediately and return only an error summary. If `/machines` is a valid JSON array but some individual machine records are malformed, skip those records and add the error message `Skipped malformed machine record`.

Constraints

  • 0 <= number of machine records <= 100000
  • 0 <= max_retries <= 10
  • The total number of simulated POST status codes across all machines is at most 200000

Examples

Input: ((200, '{"rack":"rack-17","machine_type":"gpu","minimum_error_count":3}'), (200, '[{"id":"m1","rack":"rack-17","machine_type":"gpu","error_count":5,"state":"online"},{"id":"m2","rack":"rack-17","machine_type":"cpu","error_count":4,"state":"online"},{"id":"m3","rack":"rack-17","machine_type":"gpu","error_count":4,"state":"offline"},{"id":"m4","rack":"rack-17","machine_type":"gpu","error_count":3,"state":"online"}]'), {'m1': [503, 204], 'm4': [204]}, 2)

Expected Output: {'offline_marked': ['m1', 'm4'], 'failed': [], 'errors': []}

Explanation: Machines `m1` and `m4` match the task and are online. `m1` succeeds after one transient failure, and `m4` succeeds immediately.

Input: ((500, '{}'), (200, '[]'), {}, 1)

Expected Output: {'offline_marked': [], 'failed': [], 'errors': ['GET /task returned status 500']}

Explanation: A non-200 response from `GET /task` is a fatal error, so processing stops immediately.

Input: ((200, '{"rack":"rack-1","machine_type":"cpu","minimum_error_count":2}'), (200, 'not-json'), {}, 1)

Expected Output: {'offline_marked': [], 'failed': [], 'errors': ['Malformed JSON from /machines']}

Explanation: The machines body is not valid JSON, so the function returns an error summary without attempting any POST requests.

Input: ((200, '{"rack":"rack-1","machine_type":"cpu","minimum_error_count":2}'), (200, '[]'), {}, 2)

Expected Output: {'offline_marked': [], 'failed': [], 'errors': []}

Explanation: The machines list is empty, so there is nothing to mark offline. This is a valid edge case.

Input: ((200, '{"rack":"rack-17","machine_type":"gpu","minimum_error_count":3}'), (200, '[{"id":"m1","rack":"rack-17","machine_type":"gpu","error_count":6,"state":"online"},{"id":"m2","rack":"rack-17","machine_type":"gpu","error_count":4,"state":"online"}]'), {'m1': [503, 503, 503], 'm2': [409]}, 2)

Expected Output: {'offline_marked': [], 'failed': ['m1', 'm2'], 'errors': ['POST /machines/m1/offline failed after 3 attempts with status 503', 'POST /machines/m2/offline returned permanent status 409']}

Explanation: `m1` keeps returning a transient error until retries are exhausted. `m2` returns a permanent failure code, so it is not retried.

Input: ((200, '{"rack":"rack-2","machine_type":"cpu","minimum_error_count":1}'), (200, '[{"id":"a","rack":"rack-2","machine_type":"cpu","error_count":1,"state":"online"},{"id":"bad","rack":"rack-2"}]'), {'a': [200]}, 1)

Expected Output: {'offline_marked': ['a'], 'failed': [], 'errors': ['Skipped malformed machine record']}

Explanation: The first machine is valid and successfully marked offline. The second record is missing required fields, so it is skipped and logged as malformed.

Hints

  1. Stop early if `/task` or `/machines` cannot be parsed correctly; without both responses, you cannot safely choose machines to update.
  2. For each matching machine, simulate POST attempts from left to right and stop as soon as you hit success or the first permanent failure.
Last updated: Jun 9, 2026

Related Coding Questions

  • Implement web data fetch and storage tool - Coreweave (medium)

Loading coding console...

PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.