Design a PDF-to-Markdown Inference API
Company: Mistral AI
Role: Software Engineer
Category: ML System Design
Difficulty: hard
Interview Round: Technical Screen
Design an inference service that converts PDF files to Markdown. You can assume the following building blocks already exist:
- A CPU-intensive function that splits a PDF into individual pages and converts each page into a NumPy array
- A GPU-intensive OCR engine
- A memory-intensive post-processing step that converts OCR outputs into Markdown or assembles final page results
Discuss two scenarios:
1. A synchronous API for one very large document, such as a 1000-page PDF, where the user wants the full converted output as quickly as possible
2. An asynchronous API for many concurrent conversion requests, where the client can receive the result later
Explain the API contract, page-level parallelism, CPU and GPU scheduling, batching, result ordering, intermediate storage, fault tolerance, backpressure, and how the system should scale.
Quick Answer: This question evaluates a candidate's ability to design scalable, resource-aware ML inference systems that balance API design, page-level parallelism, CPU/GPU/memory scheduling, batching, ordering, intermediate storage, fault tolerance, backpressure, and scaling trade-offs.