PracHub
QuestionsPremiumLearningGuidesInterview PrepNEWCoaches
|Home/System Design/TikTok

Design a global notification service

Last updated: Mar 29, 2026

Quick Overview

In the System Design category, this question evaluates architecture and operational competencies for building a globally distributed, multi-tenant notification platform, covering data residency, high availability, API and data model design, idempotency/deduplication, rate limiting, storage and queueing choices, worker orchestration, disaster recovery, capacity planning, and observability. It is commonly asked to assess an engineer's ability to reason about architectural trade-offs and make operational decisions across both conceptual system-level design and practical application details such as regional partitioning, replication strategies, API behavior, and SLO-driven monitoring.

  • hard
  • TikTok
  • System Design
  • Software Engineer

Design a global notification service

Company: TikTok

Role: Software Engineer

Category: System Design

Difficulty: hard

Interview Round: Technical Screen

Design a globally distributed notification service that sends real-time and scheduled messages (email, SMS, push) to tens of millions of users under regional compliance constraints. Define APIs and data models; describe deduplication, idempotency, and rate limiting; choose storage and queueing layers; outline worker orchestration, retry/backoff, and ordering guarantees; design multi-region failover and disaster recovery; and provide capacity planning with rough estimates and monitoring/alerting.

Quick Answer: In the System Design category, this question evaluates architecture and operational competencies for building a globally distributed, multi-tenant notification platform, covering data residency, high availability, API and data model design, idempotency/deduplication, rate limiting, storage and queueing choices, worker orchestration, disaster recovery, capacity planning, and observability. It is commonly asked to assess an engineer's ability to reason about architectural trade-offs and make operational decisions across both conceptual system-level design and practical application details such as regional partitioning, replication strategies, API behavior, and SLO-driven monitoring.

Related Interview Questions

  • Choose tools for scalable distributed systems - TikTok (medium)
  • Design a distributed key-value store - TikTok (medium)
  • Design a content moderation system - TikTok (medium)
  • Design low-latency large-scale hotel booking system - TikTok (medium)
  • Explain SRE architecture and troubleshooting scenarios - TikTok (hard)
TikTok logo
TikTok
Jul 17, 2025, 12:00 AM
Software Engineer
Technical Screen
System Design
4
0

System Design: Globally Distributed Notification Service

Context

You are designing a multi-tenant notification platform that delivers real-time and scheduled messages (email, SMS, push) to tens of millions of users worldwide. The system must comply with regional data residency and privacy regulations, support high availability across multiple regions, and provide strong operational controls (idempotency, deduplication, rate limiting, retries, monitoring).

Assume:

  • Users and data are partitioned by region (e.g., US, EU, APAC) with strict residency for PII.
  • The platform offers APIs to trigger individual and bulk notifications and manage templates and user preferences.
  • Peak traffic can spike rapidly (e.g., incident alerts, promotions).

Requirements

Design a system that addresses the following:

  1. APIs and Data Models
  • Define REST APIs for:
    • Sending real-time and scheduled notifications (single and bulk)
    • Managing templates and variables
    • Managing user preferences and subscriptions
    • Retrieving message status and delivery receipts
  • Specify core data models (Message, DeliveryAttempt, Template, Campaign/Job, UserPreference, IdempotencyRecord, RateLimitBucket).
  1. Deduplication, Idempotency, and Rate Limiting
  • Describe how to prevent duplicate sends across retries and concurrent requests.
  • Provide idempotency strategy at API and worker levels.
  • Define rate limiting scopes (per-user, per-tenant, per-channel, per-provider) and algorithms.
  1. Storage and Queueing Layers
  • Choose storage for:
    • Control-plane metadata (tenants, templates, campaigns)
    • Regional data-plane (messages, attempts, user preferences)
    • Caching (idempotency keys, rate limiter state)
    • Object/blob storage (large templates, assets)
  • Choose queueing/streaming for fan-out, ordering, retries, scheduled delivery, and DLQs.
  1. Worker Orchestration, Retry/Backoff, Ordering
  • Describe worker topology and autoscaling.
  • Define retry/backoff policies and DLQ handling by failure type.
  • Specify ordering guarantees (e.g., per-user per-channel) and how partitions/keys enforce it.
  1. Multi-Region Architecture, Failover, and Disaster Recovery
  • Active-active by region with data residency.
  • Control-plane and data-plane split; inter-region replication where lawful.
  • Provider redundancy and failover strategy.
  • RTO/RPO targets and DR workflows.
  1. Capacity Planning (Rough Estimates)
  • QPS, throughput, partitions, worker counts, storage footprint, cache sizing.
  • Include formulas and a worked example for tens of millions of users.
  1. Monitoring and Alerting
  • SLOs/SLIs, key metrics, logs/traces, synthetic checks.
  • Alerting policies and on-call runbooks.

State key assumptions, call out trade-offs, and justify major design choices.

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More System Design•More TikTok•More Software Engineer•TikTok Software Engineer•TikTok System Design•Software Engineer System Design
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.