Adobe Multi-Tenant Sharding And Access Control

What's being tested

The interviewer is probing your ability to design and reason about multi-tenant architectures that scale and remain secure: choosing a sharding strategy, enforcing tenant isolation, and implementing tenant-aware access control. Expect to justify tradeoffs between operational complexity, performance (hot-tenant handling, cross-shard transactions), and security (authorization at API vs DB). Adobe cares because many services must host many customers with predictable SLAs, tenant-level billing, and strict isolation guarantees.

Core knowledge

Sharding fundamentals: choose a shard key (e.g., tenant_id) and mapping (modulo, consistent hashing, rendezvous hashing) to distribute tenants across N shards; consistent hashing minimizes remap cost on rescaling.
Tenant partitioning models: shared-schema (single DB/table with tenant_id column), schema-per-tenant, and database-per-tenant; each trades isolation versus operational overhead and number-of-connections limits.
Capacity computation: shards = ceil(total_tenants / tenants_per_shard); dimension by expected bytes and QPS: ensure shard capacity > Σ(tenant_QPS) + headroom; handle skew where top-k tenants can produce >50% load.
Hot-tenant treatment: detect via per-tenant QPS/p99 latency; options: migrate tenant to dedicated shard/DB, add read replicas, or use per-tenant throttles and caches. Migration cost includes rebalancing and possible downtime.
Cross-shard consistency: avoid distributed transactions when possible; if needed, weigh two-phase commit (2PC) (strong consistency, complex) vs saga patterns (eventual consistency, compensations). 2PC adds coordinators and blocks on failures.
Access control models: RBAC for role inheritance and simplicity, ABAC for fine-grained, attribute-driven policies; store policies centrally and enforce at the service/API boundary and optionally at DB via row-level security.
DB-level enforcement: Postgres RLS or per-tenant DB users prevents accidental tenant leakage even if app bugs exist; pair with parameterized queries and ensure tenant_id cannot be overridden.
Tokens and auth: use JWT/opaque tokens with a signed tenant_id claim; validate signature and expiry; do not trust client-provided tenant identifiers without server-side verification.
Caching and indices: include tenant_id in cache keys and DB indices to prevent cross-tenant cache poisoning and inefficient full-table scans; multi-tenant global secondary indexes can cause leakage and contention.
Observability & throttling: emit tenant-tagged metrics and traces, implement per-tenant rate limiting (e.g., token bucket) and circuit-breakers; alert on top-k tenant latency or error-rate spikes.
Secrets and encryption: use per-tenant encryption keys via KMS-backed key hierarchy if isolation/regulatory needs demand key separation; rotate keys and design for key-recovery.
Operational concerns: automation for onboarding, backups/restore per tenant, cost attribution, and testing tenant migrations; limit #connections per shard (e.g., Postgres connections ~hundreds), use connection pooling.

Worked example

Design a sharding strategy that supports hundreds of thousands of tenants with tenant-specific access control.

Frame quickly: clarify expected scale (tenants, average and p99 QPS/data), SLA, allowed downtime for rebalancing, and security requirements (logical vs cryptographic isolation).
Skeleton: (1) pick partitioning model (start shared-schema for operational simplicity if tenants are small; escalate to schema-per-tenant or dedicated DB for large tenants), (2) choose shard mapping (use consistent hashing to add/remove nodes with minimal remap), (3) enforce access control at two levels (API gateway token verification with tenant_id claim + DB RLS), (4) plan for hot-tenant detection and migration automation.
Tradeoff to flag: choosing shared-schema reduces operational overhead but increases blast radius on bugs; database-per-tenant gives isolation but does not scale to 100k tenants due to connection and resource management.
Close: mention automation (migration scripts, health checks), metrics for deciding promotion to dedicated DB, and if more time, you'd prototype migration tooling and run failure-mode simulations.

A second angle

Consider a service where a small fraction of tenants generate most load and require strict encryption-by-customer.

Same concepts apply but constraints shift: design must quickly identify hot tenants (tenant-tagged metrics) and allow elastic promotion to a database-per-tenant or dedicated shard with isolated encryption keys (KMS) for regulatory compliance.
Runtime routing needs to be tenant-aware at the API gateway or routing layer to direct hot tenants to their dedicated resources; consistent hashing must be complemented with metadata-aware routing for pinned tenants.
Access control becomes stricter: enforce cryptographic separation and audit trails, and ensure service discovery and migration preserve session affinity and token validity during cutovers.

Common pitfalls

Pitfall: Thinking a single mapping (e.g., modulo by tenant_id) is forever — this ignores tenant growth and skew; you must plan rebalancing and minimize tenant movement using consistent hashing.

Pitfall: Relying solely on application-level checks for tenant isolation — a single SQL injection or developer bug can leak data; implement DB-level guardrails like Postgres RLS and parameterized queries.

Pitfall: Underestimating operational cost of database-per-tenant — it's tempting for isolation but explodes connection counts and backup complexity; articulate a migration path and automation before choosing it.

Connections

This topic commonly leads to adjacent pivots: designing cross-region replication for tenant locality, or distributed rate-limiting and cost attribution for billing. Interviewers may also ask about failure injection and chaos-testing tenant migrations.