Design and explain robust web APIs for ML inference

Q: Design and explain robust web APIs for ML inference

This is a Coding & Algorithms interview question from NVIDIA for Data Scientist roles. View the full question and solution on PracHub.

Q: How do I approach Coding & Algorithms interview questions?

Coding & Algorithms questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master coding & algorithms interviews.

Question

Design an HTTP API for Image-Based Model Predictions

Context: Design an HTTP REST API that serves predictions for image inputs (e.g., classification, detection). Assume the service may need both synchronous and asynchronous inference, and will be consumed by first- and third-party clients.

Requirements

Endpoints, Methods, Idempotency, and Versioning

Define core endpoints (e.g., POST /v1/predict for sync, POST /v1/jobs for async, GET /v1/jobs/{id}/status).
Specify HTTP methods and how idempotency is achieved (e.g., Idempotency-Key header).
Define versioning strategy.

Request/Response Schemas, Content Types, Errors, Retries

Provide JSON and multipart request/response schemas and content types.
Define standard error codes and error schema.
Define retry semantics, exponential backoff, and use of idempotency keys.

AuthN/AuthZ, Rate Limiting/Quotas, Audit Logging

Use OAuth2/OIDC with scopes.
Describe rate limiting and quotas.
Describe audit logging requirements.

Backward Compatibility and Deprecation Policy

State which changes are backward compatible and how deprecations are communicated.

Security and Observability

TLS, input validation, JWT verification.
PII handling.
Structured logs, metrics, tracing, request IDs.

Provide a concise OpenAPI 3.0 snippet for one endpoint showing parameters, schema, and error responses.

Design and explain robust web APIs for ML inference

Design an HTTP API for Image-Based Model Predictions

Requirements

Solution (Locked)

Comments (0)