Adobe Multi-Tenant Sharding And Access Control
Asked of: Software Engineer
Last updated
What's being tested
The interviewer is probing your ability to design and reason about multi-tenant architectures that scale and remain secure: choosing a sharding strategy, enforcing tenant isolation, and implementing tenant-aware access control. Expect to justify tradeoffs between operational complexity, performance (hot-tenant handling, cross-shard transactions), and security (authorization at API vs DB). Adobe cares because many services must host many customers with predictable SLAs, tenant-level billing, and strict isolation guarantees.
Core knowledge
-
Sharding fundamentals: choose a shard key (e.g.,
tenant_id) and mapping (modulo, consistent hashing, rendezvous hashing) to distribute tenants across N shards; consistent hashing minimizes remap cost on rescaling. -
Tenant partitioning models: shared-schema (single DB/table with
tenant_idcolumn), schema-per-tenant, and database-per-tenant; each trades isolation versus operational overhead and number-of-connections limits. -
Capacity computation: shards = ceil(total_tenants / tenants_per_shard); dimension by expected bytes and QPS: ensure shard capacity > Σ(tenant_QPS) + headroom; handle skew where top-k tenants can produce >50% load.
-
Hot-tenant treatment: detect via per-tenant QPS/p99 latency; options: migrate tenant to dedicated shard/DB, add read replicas, or use per-tenant throttles and caches. Migration cost includes rebalancing and possible downtime.
-
Cross-shard consistency: avoid distributed transactions when possible; if needed, weigh two-phase commit (2PC) (strong consistency, complex) vs saga patterns (eventual consistency, compensations). 2PC adds coordinators and blocks on failures.
-
Access control models: RBAC for role inheritance and simplicity, ABAC for fine-grained, attribute-driven policies; store policies centrally and enforce at the service/API boundary and optionally at DB via row-level security.
-
DB-level enforcement:
PostgresRLS or per-tenant DB users prevents accidental tenant leakage even if app bugs exist; pair with parameterized queries and ensuretenant_idcannot be overridden. -
Tokens and auth: use
JWT/opaque tokens with a signedtenant_idclaim; validate signature and expiry; do not trust client-provided tenant identifiers without server-side verification. -
Caching and indices: include
tenant_idin cache keys and DB indices to prevent cross-tenant cache poisoning and inefficient full-table scans; multi-tenant global secondary indexes can cause leakage and contention. -
Observability & throttling: emit tenant-tagged metrics and traces, implement per-tenant rate limiting (e.g., token bucket) and circuit-breakers; alert on top-k tenant latency or error-rate spikes.
-
Secrets and encryption: use per-tenant encryption keys via
KMS-backed key hierarchy if isolation/regulatory needs demand key separation; rotate keys and design for key-recovery. -
Operational concerns: automation for onboarding, backups/restore per tenant, cost attribution, and testing tenant migrations; limit #connections per shard (e.g.,
Postgresconnections ~hundreds), use connection pooling.
Worked example
Design a sharding strategy that supports hundreds of thousands of tenants with tenant-specific access control.
-
Frame quickly: clarify expected scale (tenants, average and p99 QPS/data), SLA, allowed downtime for rebalancing, and security requirements (logical vs cryptographic isolation).
-
Skeleton: (1) pick partitioning model (start
shared-schemafor operational simplicity if tenants are small; escalate toschema-per-tenantor dedicated DB for large tenants), (2) choose shard mapping (use consistent hashing to add/remove nodes with minimal remap), (3) enforce access control at two levels (API gateway token verification withtenant_idclaim + DBRLS), (4) plan for hot-tenant detection and migration automation. -
Tradeoff to flag: choosing
shared-schemareduces operational overhead but increases blast radius on bugs;database-per-tenantgives isolation but does not scale to 100k tenants due to connection and resource management. -
Close: mention automation (migration scripts, health checks), metrics for deciding promotion to dedicated DB, and if more time, you'd prototype migration tooling and run failure-mode simulations.
A second angle
Consider a service where a small fraction of tenants generate most load and require strict encryption-by-customer.
-
Same concepts apply but constraints shift: design must quickly identify hot tenants (tenant-tagged metrics) and allow elastic promotion to a database-per-tenant or dedicated shard with isolated encryption keys (
KMS) for regulatory compliance. -
Runtime routing needs to be tenant-aware at the API gateway or routing layer to direct hot tenants to their dedicated resources; consistent hashing must be complemented with metadata-aware routing for pinned tenants.
-
Access control becomes stricter: enforce cryptographic separation and audit trails, and ensure service discovery and migration preserve session affinity and token validity during cutovers.
Common pitfalls
Pitfall: Thinking a single mapping (e.g., modulo by tenant_id) is forever — this ignores tenant growth and skew; you must plan rebalancing and minimize tenant movement using consistent hashing.
Pitfall: Relying solely on application-level checks for tenant isolation — a single SQL injection or developer bug can leak data; implement DB-level guardrails like
PostgresRLS and parameterized queries.
Pitfall: Underestimating operational cost of
database-per-tenant— it's tempting for isolation but explodes connection counts and backup complexity; articulate a migration path and automation before choosing it.
Connections
This topic commonly leads to adjacent pivots: designing cross-region replication for tenant locality, or distributed rate-limiting and cost attribution for billing. Interviewers may also ask about failure injection and chaos-testing tenant migrations.
Further reading
-
Designing Data-Intensive Applications (Kleppmann) — strong grounding on partitioning, replication, and transactions.
-
PostgresRow-Level Security docs — shows practical DB-side enforcement patterns and examples.
Related concepts
- Adobe Sharded Tenant Data And Transaction Integrity
- Adobe Creative Cloud asset search, indexing, autocomplete, and sharding
- Adobe Entitlements And Transactional Integrity
- Security, Multitenancy, And AuthorizationSystem Design
- Adobe Document Cloud real-time collaboration and offline sync
- Multi-Tenant Isolation And SandboxingSystem Design