Design Slack-like multi-tenant global messaging system

Q: Design Slack-like multi-tenant global messaging system

This is a System Design interview question from OpenAI for Software Engineer roles. View the full question and solution on PracHub.

Q: How do I approach System Design interview questions?

System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master system design interviews.

Question

Loading...

Design a team messaging platform similar to Slack that supports multiple organizations (multi-tenancy) and is deployed globally.

Functional requirements

Users can belong to one or more workspaces (tenants/organizations).
Each workspace has multiple channels (public and private) and direct messages (DMs) .
Users can:
- Send and receive real-time text messages in channels and DMs.
- See message history in channels and DMs.
- See basic presence (online/away) for other users in the same workspace.
Messages must be delivered with low latency (e.g., p95 < 200 ms) for active users.

Non-functional & multi-tenant requirements

The service must support millions of users across tens of thousands of workspaces .
Multi-tenancy :
- Strict data isolation between workspaces: users in one workspace must never see data from another workspace.
- Different workspaces can have different configurations and limits (e.g., message retention, file size limits).
- The system should defend against noisy neighbors (one tenant over-consuming shared resources).
Global deployment :
- Users are geographically distributed (e.g., Americas, Europe, Asia).
- Users should connect to a nearby region for good latency.
- Many large organizations have employees in multiple regions in the same workspace .

Design tasks

Describe a design that covers at least the following aspects:

API and high-level architecture
- Key services (e.g., gateway/API layer, auth, workspace/channel management, messaging, presence, search, notification).
- How clients (web/desktop/mobile) connect to the system for real-time messaging (e.g., WebSockets, long polling).
Data model and storage
- Core entities: Workspace (Tenant) , User , Membership , Channel , Message .
- What storage technologies you would use for:
  - Metadata (users, workspaces, channels, memberships).
  - Messages and their history.
- How you would partition/shard data to scale to many tenants and users.
Multi-tenant architecture
- How you will represent tenant boundaries in the data model and APIs (e.g., tenant_id everywhere).
- Options for physically storing tenant data: fully shared DB with a tenant_id column, separate DB per tenant, or a hybrid; discuss pros/cons.
- How you enforce security and isolation across all layers (auth, services, storage).
- Handling noisy neighbors (rate limiting, quotas, priority or dedicated resources for large tenants).
Global deployment and replication
- How you would deploy the system into multiple regions.
- How users get routed to the closest region (e.g., DNS, anycast, global load balancers).
- How data for a single global workspace is handled when users are in multiple regions:
  - Where is the source of truth for messages of a workspace?
  - How are messages replicated across regions (e.g., asynchronous replication, regional caches)?
  - What consistency guarantees do you provide (e.g., eventual consistency across regions vs strong consistency within a region)?
- Strategies for regional failover and disaster recovery.
Scalability and performance
- How you would scale:
  - WebSocket / real-time connections.
  - Message fan-out to many subscribers in a busy channel.
  - Message storage and retrieval.
- Caching strategies and indexing for recent history vs deep history.
Other considerations (at a high level)
- Search and message indexing across channels in a workspace.
- File attachments (storage and access controls) if you have time.
- Security (encryption in transit/at rest, per-tenant encryption keys, audit logging).

Explain the trade-offs you are making (e.g., consistency vs availability, shared vs isolated tenant storage) and justify your choices in terms of reliability, cost, and operational complexity.

Design Slack-like multi-tenant global messaging system

Functional requirements

Non-functional & multi-tenant requirements

Design tasks

Solution

Comments (0)