PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/System Design/Harvey

Design a Secure PDF Data Room

Last updated: Jun 17, 2026

Quick Overview

This question evaluates a candidate's mastery of system design and secure authorization models for multi-tenant file-sharing platforms, focusing on organization-level access control, data modeling, consistency, scalable serving of large PDF assets, and auditability.

  • medium
  • Harvey
  • System Design
  • Software Engineer

Design a Secure PDF Data Room

Company: Harvey

Role: Software Engineer

Category: System Design

Difficulty: medium

Interview Round: Onsite

Design a **virtual data room** product that lets companies organize and securely share confidential PDF documents with other organizations. The product should feel like a cloud drive: users create *data rooms*, organize PDFs into folders, upload and view documents, and invite other organizations to access selected content. For this interview, assume the system supports **only PDF files** at launch. The central, non-negotiable focus is **organization-level access control**. A company must be able to grant another organization access to a data room, folder, or individual document with permissions such as *view-only* or *admin*. The system must enforce these permissions **consistently** across browsing, downloading, viewing, and search — there must be no path that leaks a document the caller is not authorized to see. ### Constraints & Assumptions - **Files:** PDF only. Individual files can be large (high-page-count diligence documents), and a single room may hold many thousands of documents. - **Tenancy:** Multi-tenant. Every user belongs to one or more organizations. A room is owned by one organization and may be shared with many others. - **Workloads:** Read-heavy (browsing, viewing, search) far outweighs writes (upload, ACL changes). Assume viewing/browsing should feel interactive (sub-second metadata, fast first-page render). - **Security & compliance:** Documents are highly confidential (legal, financial, M&A diligence). Encryption at rest and in transit is mandatory. A tamper-resistant audit trail is required. Treat "no unauthorized access, ever" as a hard correctness requirement, not a best-effort SLO. - **Availability:** Prioritize availability for reads/viewing; uploads and ACL writes can tolerate slightly higher latency. ### The Problem Produce an end-to-end design covering: functional & non-functional requirements; the major services and storage choices; a data model for organizations, users, rooms, folders, documents, and ACLs; permission-evaluation rules including inheritance and overrides; APIs for creating rooms, uploading PDFs, inviting organizations, and checking access; how PDFs are served and protected for viewing/downloading; and auditing, logging, and monitoring. Make the **authorization model** the spine of the design. ```hint Where to start Separate the two hard sub-problems: (1) a metadata/ACL system that needs strong consistency, and (2) bulk PDF storage + serving that needs scale and short-lived secure access. Object storage for blobs, a relational DB for metadata and ACLs. ``` ```hint Where do access decisions live? List every path that can surface a document — browse, view, download, search, any API. If each one re-implements its own check, what's the chance they stay perfectly in sync as the system grows? Think about what structural choice would make a leak hard *by construction* rather than by careful review. ``` ```hint Inheritance & evaluation ACLs are grants attached to resources (`room` / `folder` / `document`) for principals (chiefly `organization`). How would you compute a document's effective permissions from its room grant, ancestor-folder grants, and any direct grant? Decide early whether v1 even needs explicit `deny` — and if you allow it, work out which way a conflict between an allow and a deny must resolve, and what that costs in explainability. ``` ```hint How do bytes actually reach the viewer? If the app tier streamed every PDF itself it wouldn't scale; if it handed out a durable storage URL, who could re-share it? Find a middle path that offloads the bytes yet stays gated by your authorization check. Then ask what "view-only" should mean for a document the user can still screenshot — and how you'd make a leaked copy traceable back to who leaked it. ``` ```hint What does search do with permissions? A full-text index that ignores ACLs will happily return documents the caller can't see. How does authorization interact with the index — do you filter before, during, or after the query, and what does each choice cost you in latency and in staleness when an ACL changes mid-flight? Whatever you pick, ask whether it can ever *show* a document that was just revoked. ``` ### Clarifying Questions to Ask - What is the granularity of sharing — can an external org be granted access to a single document, or only to a whole room/folder? - Do we need explicit *deny* rules, or is allow-only (union of grants) sufficient for v1? - Is view-only access expected to prevent download/printing/screenshotting, or only to gate the download endpoint? - What are the compliance/retention requirements for the audit log (immutability window, retention period, who can read it)? - Do we need full-text search inside PDF content, or only over document/folder names and metadata? - Are there data-residency or per-tenant key-isolation requirements? ### What a Strong Answer Covers - A clear split of functional vs. non-functional requirements, with security treated as a correctness constraint. - A **single, centralized authorization decision point** that every read/write/serve/search path consults — and an explicit argument for why that prevents leaks. - A coherent data model: organizations, users, memberships, rooms, folders (hierarchy), documents (with object-storage keys + lifecycle status), and ACL grants keyed by resource + principal. - A precise, *explainable* permission-evaluation algorithm: inheritance from room → folder chain → document, the union/override semantics, and how the owning org's implicit rights are handled. - A secure upload pipeline (pre-signed direct upload, validation, async virus/PDF scan, status lifecycle) and a secure serving pipeline (short-lived signed URLs, optional page-rendering + watermarking). - ACL-aware search that cannot return unauthorized documents, with the index-vs-post-filter tradeoff named. - A tamper-resistant audit log: what events are captured, the event schema, append-only storage, and retention. - Caching/consistency reasoning that does not trade away authorization correctness (short TTLs, invalidation on ACL change, fast-expiring URLs). - Scalability choices (blobs in object storage, async workers, partitioned audit logs) and the key tradeoffs, stated as tradeoffs. ### Follow-up Questions - A document is moved into a folder with *more restrictive* sharing. How do you ensure the effective permissions update atomically, and how do you handle in-flight signed URLs already issued under the old permissions? - An org's access to a room is revoked. What is the maximum window during which a previously-cached authorization decision or an outstanding signed URL could still grant access, and how do you bound it? - How would you extend the ACL model to support per-user (not just per-org) exceptions and time-bounded access (e.g., access that expires at a deal's close) without making permission evaluation unexplainable? - How would you support legally-defensible "who viewed what, when" reporting and detect anomalous bulk-download behavior?

Quick Answer: This question evaluates a candidate's mastery of system design and secure authorization models for multi-tenant file-sharing platforms, focusing on organization-level access control, data modeling, consistency, scalable serving of large PDF assets, and auditability.

Related Interview Questions

  • Design Cloud File Storage - Harvey (medium)
  • Design a Cloud File Storage Service - Harvey (medium)
  • Design a RAG question-answering system - Harvey (medium)
  • Design a Cloud File Storage Service - Harvey (medium)
  • Design a secure document vault - Harvey (medium)
Harvey logo
Harvey
Apr 20, 2026, 12:00 AM
Software Engineer
Onsite
System Design
19
0
Loading...

Design a virtual data room product that lets companies organize and securely share confidential PDF documents with other organizations.

The product should feel like a cloud drive: users create data rooms, organize PDFs into folders, upload and view documents, and invite other organizations to access selected content. For this interview, assume the system supports only PDF files at launch.

The central, non-negotiable focus is organization-level access control. A company must be able to grant another organization access to a data room, folder, or individual document with permissions such as view-only or admin. The system must enforce these permissions consistently across browsing, downloading, viewing, and search — there must be no path that leaks a document the caller is not authorized to see.

Constraints & Assumptions

  • Files: PDF only. Individual files can be large (high-page-count diligence documents), and a single room may hold many thousands of documents.
  • Tenancy: Multi-tenant. Every user belongs to one or more organizations. A room is owned by one organization and may be shared with many others.
  • Workloads: Read-heavy (browsing, viewing, search) far outweighs writes (upload, ACL changes). Assume viewing/browsing should feel interactive (sub-second metadata, fast first-page render).
  • Security & compliance: Documents are highly confidential (legal, financial, M&A diligence). Encryption at rest and in transit is mandatory. A tamper-resistant audit trail is required. Treat "no unauthorized access, ever" as a hard correctness requirement, not a best-effort SLO.
  • Availability: Prioritize availability for reads/viewing; uploads and ACL writes can tolerate slightly higher latency.

The Problem

Produce an end-to-end design covering: functional & non-functional requirements; the major services and storage choices; a data model for organizations, users, rooms, folders, documents, and ACLs; permission-evaluation rules including inheritance and overrides; APIs for creating rooms, uploading PDFs, inviting organizations, and checking access; how PDFs are served and protected for viewing/downloading; and auditing, logging, and monitoring. Make the authorization model the spine of the design.

Clarifying Questions to Ask

  • What is the granularity of sharing — can an external org be granted access to a single document, or only to a whole room/folder?
  • Do we need explicit deny rules, or is allow-only (union of grants) sufficient for v1?
  • Is view-only access expected to prevent download/printing/screenshotting, or only to gate the download endpoint?
  • What are the compliance/retention requirements for the audit log (immutability window, retention period, who can read it)?
  • Do we need full-text search inside PDF content, or only over document/folder names and metadata?
  • Are there data-residency or per-tenant key-isolation requirements?

What a Strong Answer Covers

  • A clear split of functional vs. non-functional requirements, with security treated as a correctness constraint.
  • A single, centralized authorization decision point that every read/write/serve/search path consults — and an explicit argument for why that prevents leaks.
  • A coherent data model: organizations, users, memberships, rooms, folders (hierarchy), documents (with object-storage keys + lifecycle status), and ACL grants keyed by resource + principal.
  • A precise, explainable permission-evaluation algorithm: inheritance from room → folder chain → document, the union/override semantics, and how the owning org's implicit rights are handled.
  • A secure upload pipeline (pre-signed direct upload, validation, async virus/PDF scan, status lifecycle) and a secure serving pipeline (short-lived signed URLs, optional page-rendering + watermarking).
  • ACL-aware search that cannot return unauthorized documents, with the index-vs-post-filter tradeoff named.
  • A tamper-resistant audit log: what events are captured, the event schema, append-only storage, and retention.
  • Caching/consistency reasoning that does not trade away authorization correctness (short TTLs, invalidation on ACL change, fast-expiring URLs).
  • Scalability choices (blobs in object storage, async workers, partitioned audit logs) and the key tradeoffs, stated as tradeoffs.

Follow-up Questions

  • A document is moved into a folder with more restrictive sharing. How do you ensure the effective permissions update atomically, and how do you handle in-flight signed URLs already issued under the old permissions?
  • An org's access to a room is revoked. What is the maximum window during which a previously-cached authorization decision or an outstanding signed URL could still grant access, and how do you bound it?
  • How would you extend the ACL model to support per-user (not just per-org) exceptions and time-bounded access (e.g., access that expires at a deal's close) without making permission evaluation unexplainable?
  • How would you support legally-defensible "who viewed what, when" reporting and detect anomalous bulk-download behavior?

Solution

Show

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More System Design•More Harvey•More Software Engineer•Harvey Software Engineer•Harvey System Design•Software Engineer System Design
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.