Design CSV upload endpoint with GPT classification
Company: Scale AI
Role: Software Engineer
Category: Software Engineering Fundamentals
Difficulty: medium
Interview Round: Onsite
You are building a backend service that needs to process two CSV files and then call an external GPT-like API for classification.
**Requirements**
1. **HTTP Endpoint**
- Expose an HTTP endpoint, e.g. `POST /ingest-data`.
- The client uploads two CSV files in a single request:
- `users.csv`
- `tasks.csv`
- A typical row in `users.csv` might be: `user_id,name,email`.
- A typical row in `tasks.csv` might be: `task_id,user_id,description`.
2. **CSV Parsing and Local JSON Storage**
- The endpoint should:
- Receive the two CSV files.
- Parse them into in-memory data structures (e.g., lists of objects).
- Serialize each dataset into JSON.
- Persist the resulting JSON to the local filesystem (e.g., `users.json`, `tasks.json`).
3. **GPT Classification Step**
- After parsing, the service should call an external GPT-like API to classify **one field** in the JSON data. For example:
- For each task in `tasks.json`, classify the `description` into one of a small set of categories (e.g., `"bug"`, `"feature"`, `"documentation"`).
- The GPT API:
- Is accessed via HTTPS.
- Takes a text prompt and returns a classification label in JSON.
- You are free to design the prompt and to decide whether to call the GPT API per-record or in batches, as long as all tasks end up with a classification label.
4. **Response**
- After classification, return an HTTP response that includes at least:
- A success indicator.
- Basic stats (e.g., number of users, number of tasks processed).
- Optionally, the enriched `tasks` data with the new classification field.
5. **Non-functional Requirements**
- Handle basic validation and error cases (missing file, malformed CSV, GPT API failure).
- Assume multiple clients may call this endpoint concurrently.
- The solution should be reasonably testable.
**Task**
Describe how you would design and implement this endpoint, including:
- The HTTP API contract (request format, response format).
- How you handle file uploads and CSV parsing.
- How you structure the code to write JSON to local storage.
- How you integrate with the GPT classification API (including error handling and possible batching).
- Considerations for concurrency, timeouts, and testing.
Quick Answer: This question evaluates backend engineering skills including HTTP API design, multipart file handling, CSV parsing and serialization, local JSON persistence, and integration with an external GPT-style classification API.