Design a Memo Q&A Agent for a Large Law Firm
Company: Harvey
Role: Software Engineer
Category: ML System Design
Difficulty: medium
Interview Round: Onsite
# Design a Memo Q&A Agent for a Large Law Firm
Design an AI system that lets attorneys at a large ("big law") firm ask natural-language questions and get answers grounded in the firm's published legal memos. The corpus is assembled by crawling memos from law-firm websites (the firm's own and, where permitted, peer firms'), then chunked and indexed so the system can retrieve the most relevant passages and have an LLM compose an answer with citations back to the source memos.
This is an open-ended design discussion. The interviewer cares less about a single "correct" architecture than about whether you can identify what actually makes an LLM-grounded retrieval system work in production — and what makes it fail. Retrieval-augmented generation (RAG) is the obvious backbone, but it is **one component**, not the whole answer. Budget your depth across ingestion, retrieval quality, answer grounding/faithfulness, and evaluation — do not spend the whole session on RAG mechanics.
### Constraints & Assumptions
- Corpus size: on the order of $10^5$ memos (tens of millions of tokens), growing as new memos are crawled weekly.
- Each memo is a long document (often 5-50 pages) covering a legal topic (e.g., a regulatory update, a deal structure, a litigation development).
- Users: a few thousand attorneys; query volume is modest (thousands of queries/day, bursty during business hours), so this is a **quality-first**, not throughput-first, system.
- Latency target: interactive — first token within a couple of seconds, full answer within ~10 seconds is acceptable.
- Correctness bar is very high: a wrong or unsupported legal statement is far worse than "I don't know." Every claim in an answer must be traceable to a cited memo passage.
- Assume access to a commercial LLM API and an embedding model; you are not training a foundation model from scratch.
### Clarifying Questions to Ask
- Whose memos are in scope — only the firm's own published memos, or also crawled memos from other firms' public sites? What are the licensing / robots.txt / copyright constraints on crawling and storing third-party memos?
- Is the answer meant to be advisory drafting support for an attorney (human always in the loop), or could it ever be surfaced to a client? This sets the bar for hedging and disclaimers.
- How fresh must answers be — does a memo published yesterday need to be answerable today, or is weekly ingestion acceptable?
- Are there access-control requirements (e.g., some memos are confidential to certain practice groups or matters) that retrieval must respect?
- What does "an answer" look like — a short synthesized paragraph, a list of relevant memos, or a long drafted analysis? And must it always cite sources?
- What is the acceptable behavior when the corpus does not contain a grounded answer?
### Part 1 — Ingestion: crawling, parsing, and chunking
Design the pipeline that turns memos on the web into an indexed, queryable corpus. Cover how you crawl and re-crawl law-firm sites politely and legally, how you parse heterogeneous source formats (HTML pages, PDFs) into clean text while preserving structure (headings, sections, defined terms, footnotes), and how you chunk long memos for retrieval. Justify your chunking strategy — fixed-size vs. structure-aware — and how you attach metadata (firm, practice area, publication date, source URL, section heading) to each chunk.
```hint Where to start
Separate three stages: (1) discovery + polite crawling (sitemaps, robots.txt, rate limits, dedup of unchanged pages via ETag/content hash), (2) parsing to structured text, (3) chunking + metadata. Each has its own failure modes.
```
```hint Chunking
Long legal memos have strong internal structure. Naive fixed-token chunks split mid-argument and lose the heading context a retriever needs. Consider section-aware chunks with overlap, and prepend the document title + section path into each chunk so an isolated chunk is still self-describing.
```
#### Clarifying Questions for this Part
- Are the source pages static HTML, or JS-rendered (requiring a headless browser)? Are memos behind gated "download" forms?
- Do we have permission to store full text of third-party memos, or only index/link to them?
- How do we detect that a previously crawled memo was updated or retracted?
#### What This Part Should Cover
```premium-lock What This Part Should Cover
```
### Part 2 — Retrieval and grounded answer generation
Design the query path. Given an attorney's question, how do you retrieve the most relevant chunks and have the LLM produce a grounded, cited answer? Address the embedding/index choice and why; whether you use pure vector search or hybrid (lexical + vector) retrieval, and why legal text in particular benefits from lexical signals; reranking; how many chunks you feed the model and how you fit the context budget; the prompt structure that forces the model to answer **only** from retrieved passages and to cite them; and the behavior when retrieval returns nothing relevant.
```hint Retrieval quality
Legal queries hinge on exact terms — statute names, defined terms, party names, citations — that pure semantic embeddings blur together. Hybrid retrieval (BM25/keyword + dense vectors) plus a cross-encoder reranker usually beats vector-only here.
```
```hint Grounding
The model must cite, and must refuse when unsupported. Put retrieved passages with stable IDs in the prompt, instruct the model to answer only from them and attach a citation to each claim, and add a verification step that checks every cited span actually exists in the retrieved context.
```
#### What This Part Should Cover
```premium-lock What This Part Should Cover
```
### Part 3 — Evaluation, faithfulness, and monitoring
How do you know the system is good, and how do you keep it good? Define the offline evaluation (retrieval metrics vs. answer-quality metrics — they are different things), how you build a labeled eval set in a low-volume legal domain, how you detect hallucination / ungrounded claims automatically, and what you monitor in production (citation-coverage, refusal rate, latency, user feedback, drift as the corpus grows).
```hint Two layers
Evaluate retrieval and generation separately. Retrieval: recall@k / nDCG against known-relevant chunks. Generation: faithfulness (is every claim supported by a cited chunk?) and answer correctness, scored by humans and/or an LLM-as-judge calibrated against human labels.
```
#### What This Part Should Cover
```premium-lock What This Part Should Cover
```
### What a Strong Answer Covers
```premium-lock What a Strong Answer Covers
```
### Follow-up Questions
- The crawled corpus contains two memos from different firms that **disagree** on the same legal question. How should the agent answer? How do you surface the conflict rather than silently pick one?
- A regulator publishes a change that **supersedes** the conclusion of several memos already in your index. How does your pipeline make the agent stop citing the now-stale memos as authority?
- Attorneys complain the agent is "too cautious" and refuses on questions it could partially answer. How do you tune the grounding/refusal threshold without increasing hallucinations, and how do you measure that trade-off?
- How would your design change if query volume grew 100×, or if you needed to support multi-turn conversations where follow-up questions depend on earlier answers?
Quick Answer: This question evaluates ML system design skills, specifically retrieval-augmented generation (RAG) and LLM-grounded document retrieval at production scale. It tests the ability to reason across ingestion, retrieval quality, answer faithfulness, and evaluation — core competencies in senior AI engineering roles.