Design CSV upload endpoint with GPT classification

Q: Design CSV upload endpoint with GPT classification

This is a Software Engineering Fundamentals interview question from Scale AI for Software Engineer roles. View the full question and solution on PracHub.

Q: How do I approach Software Engineering Fundamentals interview questions?

Software Engineering Fundamentals questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master software engineering fundamentals interviews.

Question

You are building a backend service that needs to process two CSV files and then call an external GPT-like API for classification.

Requirements

HTTP Endpoint
- Expose an HTTP endpoint, e.g. POST /ingest-data .
- The client uploads two CSV files in a single request:
  - users.csv
  - tasks.csv
- A typical row in users.csv might be: user_id,name,email .
- A typical row in tasks.csv might be: task_id,user_id,description .
CSV Parsing and Local JSON Storage
- The endpoint should:
  - Receive the two CSV files.
  - Parse them into in-memory data structures (e.g., lists of objects).
  - Serialize each dataset into JSON.
  - Persist the resulting JSON to the local filesystem (e.g., users.json , tasks.json ).
GPT Classification Step
- After parsing, the service should call an external GPT-like API to classify one field in the JSON data. For example:
  - For each task in tasks.json , classify the description into one of a small set of categories (e.g., "bug" , "feature" , "documentation" ).
- The GPT API:
  - Is accessed via HTTPS.
  - Takes a text prompt and returns a classification label in JSON.
- You are free to design the prompt and to decide whether to call the GPT API per-record or in batches, as long as all tasks end up with a classification label.
Response
- After classification, return an HTTP response that includes at least:
  - A success indicator.
  - Basic stats (e.g., number of users, number of tasks processed).
  - Optionally, the enriched tasks data with the new classification field.
Non-functional Requirements
- Handle basic validation and error cases (missing file, malformed CSV, GPT API failure).
- Assume multiple clients may call this endpoint concurrently.
- The solution should be reasonably testable.

Task

Describe how you would design and implement this endpoint, including:

The HTTP API contract (request format, response format).
How you handle file uploads and CSV parsing.
How you structure the code to write JSON to local storage.
How you integrate with the GPT classification API (including error handling and possible batching).
Considerations for concurrency, timeouts, and testing.

Design CSV upload endpoint with GPT classification

Solution

Comments (0)