Prompt
Design a multi-tenant data analysis platform for government and enterprise customers.
The interviewer emphasizes that the key challenge is not scale, but:
-
Extremely complex permissions / access control
-
Compliance requirements
(data handling, retention, approvals)
-
Audit logging
(tamper-evident, explainable)
-
Data lineage
(trace data origins and transformations)
They repeatedly stress:
“Assume mistakes will happen. How do we detect and recover?”
Requirements (make reasonable assumptions)
-
Multiple tenants (organizations) share the platform. Some customers may require strict isolation.
-
Users include analysts, data engineers, auditors, admins, and external reviewers.
-
Data sources: batch ingestion and/or uploaded datasets; transformations and query/BI-style analysis.
-
Authorization must support complex real-world rules (attributes, projects, clearance, purpose, time, geography, approvals, etc.).
-
Platform must provide:
-
Strong authentication + authorization
-
Fine-grained data access enforcement (dataset/table/column/row and API-level)
-
Immutable audit trails and reporting
-
End-to-end lineage across ingestion → transforms → derived datasets → queries
-
Detection and recovery mechanisms for inevitable human/config mistakes
Deliverables
Walk through:
-
Architecture (control plane vs data plane)
-
Tenant isolation model
-
Permission model (how policies are expressed, evaluated, and enforced)
-
Audit logging + evidence generation
-
Lineage design
-
“Mistakes will happen”: detection signals, blast-radius control, rollback/recovery, and operational processes
-
Key edge cases and trade-offs