Query Machines and Mark Them Offline
Company: Coreweave
Role: Site Reliability Engineer
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Onsite
You are given access to a local HTTP server that simulates a production machine-management service. Write a program that queries the service, identifies the machines that match the criteria returned by the API, and marks those machines offline.
Assume the server exposes endpoints similar to the following:
- `GET /task` returns a JSON object describing what to look for, for example:
```json
{
"rack": "rack-17",
"machine_type": "gpu",
"minimum_error_count": 3
}
```
- `GET /machines` returns a JSON array of machines, for example:
```json
[
{"id": "m1", "rack": "rack-17", "machine_type": "gpu", "error_count": 5, "state": "online"},
{"id": "m2", "rack": "rack-17", "machine_type": "cpu", "error_count": 4, "state": "online"}
]
```
- `POST /machines/{id}/offline` marks a machine offline. The request should be sent only for machines that satisfy the task criteria and are currently online.
Implement a command-line program that:
1. Reads the base URL of the HTTP server from an argument or configuration value.
2. Calls `GET /task` to discover the filtering criteria.
3. Calls `GET /machines` to retrieve all machines.
4. Finds every online machine matching the criteria.
5. Sends `POST /machines/{id}/offline` for each matching machine.
6. Handles common production issues such as non-200 responses, malformed JSON, empty result sets, retries for transient failures, and clear logging or error messages.
Quick Answer: This question evaluates competency in interacting with RESTful HTTP APIs, JSON parsing and filtering, automation of operational workflows, resilient error handling (including retries and malformed responses), and command-line configuration for production-like services.