How do I approach ML System Design interview questions?

ML System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master ml system design interviews.

What difficulty level is this interview question?

This is a medium difficulty ML System Design question, commonly asked during Onsite rounds at Harvey.

What role is this question designed for?

This question is commonly asked for Software Engineer candidates at Harvey during technical interviews.

Design a Memo Q&A Agent for a Large Law Firm

Q: Design a Memo Q&A Agent for a Large Law Firm

This question evaluates ML system design skills, specifically retrieval-augmented generation (RAG) and LLM-grounded document retrieval at production scale. It tests the ability to reason across ingestion, retrieval quality, answer faithfulness, and evaluation — core competencies in senior AI engineering roles.

Design a Memo Q&A Agent for a Large Law Firm

Design an AI system that lets attorneys at a large ("big law") firm ask natural-language questions and get answers grounded in the firm's published legal memos. The corpus is assembled by crawling memos from law-firm websites (the firm's own and, where permitted, peer firms'), then chunked and indexed so the system can retrieve the most relevant passages and have an LLM compose an answer with citations back to the source memos.

This is an open-ended design discussion. The interviewer cares less about a single "correct" architecture than about whether you can identify what actually makes an LLM-grounded retrieval system work in production — and what makes it fail. Retrieval-augmented generation (RAG) is the obvious backbone, but it is one component, not the whole answer. Budget your depth across ingestion, retrieval quality, answer grounding/faithfulness, and evaluation — do not spend the whole session on RAG mechanics.

Constraints & Assumptions

Corpus size: on the order of $10^5$ memos (tens of millions of tokens), growing as new memos are crawled weekly.
Each memo is a long document (often 5-50 pages) covering a legal topic (e.g., a regulatory update, a deal structure, a litigation development).
Users: a few thousand attorneys; query volume is modest (thousands of queries/day, bursty during business hours), so this is a quality-first , not throughput-first, system.
Latency target: interactive — first token within a couple of seconds, full answer within ~10 seconds is acceptable.
Correctness bar is very high: a wrong or unsupported legal statement is far worse than "I don't know." Every claim in an answer must be traceable to a cited memo passage.
Assume access to a commercial LLM API and an embedding model; you are not training a foundation model from scratch.

Clarifying Questions to Ask

Whose memos are in scope — only the firm's own published memos, or also crawled memos from other firms' public sites? What are the licensing / robots.txt / copyright constraints on crawling and storing third-party memos?
Is the answer meant to be advisory drafting support for an attorney (human always in the loop), or could it ever be surfaced to a client? This sets the bar for hedging and disclaimers.
How fresh must answers be — does a memo published yesterday need to be answerable today, or is weekly ingestion acceptable?
Are there access-control requirements (e.g., some memos are confidential to certain practice groups or matters) that retrieval must respect?
What does "an answer" look like — a short synthesized paragraph, a list of relevant memos, or a long drafted analysis? And must it always cite sources?
What is the acceptable behavior when the corpus does not contain a grounded answer?

Part 1 — Ingestion: crawling, parsing, and chunking

Design the pipeline that turns memos on the web into an indexed, queryable corpus. Cover how you crawl and re-crawl law-firm sites politely and legally, how you parse heterogeneous source formats (HTML pages, PDFs) into clean text while preserving structure (headings, sections, defined terms, footnotes), and how you chunk long memos for retrieval. Justify your chunking strategy — fixed-size vs. structure-aware — and how you attach metadata (firm, practice area, publication date, source URL, section heading) to each chunk.

Clarifying Questions for this Part

Are the source pages static HTML, or JS-rendered (requiring a headless browser)? Are memos behind gated "download" forms?
Do we have permission to store full text of third-party memos, or only index/link to them?
How do we detect that a previously crawled memo was updated or retracted?

What This Part Should Cover Premium

Part 2 — Retrieval and grounded answer generation

Design the query path. Given an attorney's question, how do you retrieve the most relevant chunks and have the LLM produce a grounded, cited answer? Address the embedding/index choice and why; whether you use pure vector search or hybrid (lexical + vector) retrieval, and why legal text in particular benefits from lexical signals; reranking; how many chunks you feed the model and how you fit the context budget; the prompt structure that forces the model to answer only from retrieved passages and to cite them; and the behavior when retrieval returns nothing relevant.

What This Part Should Cover Premium

Part 3 — Evaluation, faithfulness, and monitoring

How do you know the system is good, and how do you keep it good? Define the offline evaluation (retrieval metrics vs. answer-quality metrics — they are different things), how you build a labeled eval set in a low-volume legal domain, how you detect hallucination / ungrounded claims automatically, and what you monitor in production (citation-coverage, refusal rate, latency, user feedback, drift as the corpus grows).

What This Part Should Cover Premium

What a Strong Answer Covers Premium

Follow-up Questions

The crawled corpus contains two memos from different firms that disagree on the same legal question. How should the agent answer? How do you surface the conflict rather than silently pick one?
A regulator publishes a change that supersedes the conclusion of several memos already in your index. How does your pipeline make the agent stop citing the now-stale memos as authority?
Attorneys complain the agent is "too cautious" and refuses on questions it could partially answer. How do you tune the grounding/refusal threshold without increasing hallucinations, and how do you measure that trade-off?
How would your design change if query volume grew 100×, or if you needed to support multi-turn conversations where follow-up questions depend on earlier answers?

Design a Memo Q&A Agent for a Large Law Firm

Constraints & Assumptions

Corpus size: on the order of $10^5$ memos (tens of millions of tokens), growing as new memos are crawled weekly.
Each memo is a long document (often 5-50 pages) covering a legal topic (e.g., a regulatory update, a deal structure, a litigation development).
Users: a few thousand attorneys; query volume is modest (thousands of queries/day, bursty during business hours), so this is a quality-first , not throughput-first, system.
Latency target: interactive — first token within a couple of seconds, full answer within ~10 seconds is acceptable.
Correctness bar is very high: a wrong or unsupported legal statement is far worse than "I don't know." Every claim in an answer must be traceable to a cited memo passage.
Assume access to a commercial LLM API and an embedding model; you are not training a foundation model from scratch.

Clarifying Questions to Ask

Whose memos are in scope — only the firm's own published memos, or also crawled memos from other firms' public sites? What are the licensing / robots.txt / copyright constraints on crawling and storing third-party memos?
Is the answer meant to be advisory drafting support for an attorney (human always in the loop), or could it ever be surfaced to a client? This sets the bar for hedging and disclaimers.
How fresh must answers be — does a memo published yesterday need to be answerable today, or is weekly ingestion acceptable?
Are there access-control requirements (e.g., some memos are confidential to certain practice groups or matters) that retrieval must respect?
What does "an answer" look like — a short synthesized paragraph, a list of relevant memos, or a long drafted analysis? And must it always cite sources?
What is the acceptable behavior when the corpus does not contain a grounded answer?

Part 1 — Ingestion: crawling, parsing, and chunking

Clarifying Questions for this Part

Are the source pages static HTML, or JS-rendered (requiring a headless browser)? Are memos behind gated "download" forms?
Do we have permission to store full text of third-party memos, or only index/link to them?
How do we detect that a previously crawled memo was updated or retracted?

What This Part Should Cover Premium

Part 2 — Retrieval and grounded answer generation

What This Part Should Cover Premium

Part 3 — Evaluation, faithfulness, and monitoring

What This Part Should Cover Premium

What a Strong Answer Covers Premium

Follow-up Questions

The crawled corpus contains two memos from different firms that disagree on the same legal question. How should the agent answer? How do you surface the conflict rather than silently pick one?
A regulator publishes a change that supersedes the conclusion of several memos already in your index. How does your pipeline make the agent stop citing the now-stale memos as authority?
Attorneys complain the agent is "too cautious" and refuses on questions it could partially answer. How do you tune the grounding/refusal threshold without increasing hallucinations, and how do you measure that trade-off?
How would your design change if query volume grew 100×, or if you needed to support multi-turn conversations where follow-up questions depend on earlier answers?

Design a Memo Q&A Agent for a Large Law Firm

Quick Overview

Design a Memo Q&A Agent for a Large Law Firm

Constraints & Assumptions

Clarifying Questions to Ask

Part 1 — Ingestion: crawling, parsing, and chunking

Clarifying Questions for this Part

What This Part Should Cover Premium

Part 2 — Retrieval and grounded answer generation

What This Part Should Cover Premium

Part 3 — Evaluation, faithfulness, and monitoring

What This Part Should Cover Premium

What a Strong Answer Covers Premium

Follow-up Questions

Solution

Submit Your Answer to Earn 20XP

Design a Memo Q&A Agent for a Large Law Firm

Quick Overview

Design a Memo Q&A Agent for a Large Law Firm

Constraints & Assumptions

Clarifying Questions to Ask

Part 1 — Ingestion: crawling, parsing, and chunking

Clarifying Questions for this Part

What This Part Should Cover Premium

Part 2 — Retrieval and grounded answer generation

What This Part Should Cover Premium

Part 3 — Evaluation, faithfulness, and monitoring

What This Part Should Cover Premium

What a Strong Answer Covers Premium

Follow-up Questions

Solution

Submit Your Answer to Earn 20XP