You are asked to design several large scale backend systems. For each one, describe:
-
Core requirements and assumptions
-
High level architecture and main components
-
Data model and storage choices
-
How you will meet scale, reliability, and latency requirements
-
How you will handle failures, retries, and consistency
1. Design a webhook delivery service
Design a webhook service that allows users to register callbacks for specific events.
Functional requirements
-
Clients can register a subscription by providing:
-
eventId (string or numeric)
-
callback URL (HTTP endpoint)
-
The mapping eventId to callback URL is unique: each eventId triggers exactly one callback URL.
-
When an event with a given eventId is triggered, the system must send an HTTP request to the registered callback URL containing a payload with event data.
-
The system should provide visibility into delivery status (success, failure, retry attempts) per event.
Non functional requirements
-
Scale: up to 1 billion events per day.
-
Rough ballpark: about 11.6k events per second on average, with traffic spikes.
-
High availability and durability of events.
-
Delivery guarantees: at least once delivery is acceptable; exactly once can be achieved with idempotency at the receiver.
-
Low but not real time latency: it is acceptable if callbacks are delivered within a few seconds.
-
Multitenant and secure: only authorized clients can create or manage their subscriptions.
Design the full system: APIs, storage, event ingestion, dispatch workers, retry logic, monitoring, and how you would scale this to 1B events per day.
2. Design a places of interest search service (Yelp or Foursquare like)
Design a service that manages and searches places of interest around the world.
Data model
-
Each place has at least:
-
id
-
name
-
location: latitude and longitude
-
place type: for example restaurant, hotel, park, etc.
-
Optional metadata like rating, address, opening hours.
Functional requirements
-
Support registering, updating, and removing places.
-
Support search queries:
-
Given a user location (latitude, longitude) and a place type, find the N nearest places.
-
Optionally support a radius or maximum distance.
-
Sort primarily by distance; you can mention additional ranking signals like rating.
Non functional requirements
-
Data scale: on the order of hundreds of millions of places globally.
-
High read throughput with low latency search (for example p95 under 100 ms for queries).
-
Updates to places do not need to be visible in real time; eventual consistency is acceptable.
Design the system, including:
-
How you store place data.
-
How you index geospatial locations to support efficient nearby search.
-
How you handle filtering by place type and other attributes.
-
How you shard and replicate data globally.
-
How you handle hotspots (for example city centers) and caching.
3. Design a Slack like chat system
Design a messaging system similar to Slack with support for individual and group chats.
Functional requirements
-
Users can send messages to:
-
Another individual user (one to one chat).
-
A group chat with multiple members.
-
Users can:
-
Create new group chats.
-
Add or remove users from group chats (assume permissions can be simplified).
-
The system delivers notifications to recipients when new messages arrive.
-
Messages can contain rich media such as images; large media content should be stored efficiently.
-
Users can delete a message they previously sent.
-
Clarify semantics: delete for everyone in the conversation, or only hide for the deleting user.
Non functional requirements
-
Support a large number of concurrent users and active conversations.
-
Low latency message delivery (near real time for online users).
-
Message history should be stored and queryable (for example loading chat history when opening a conversation).
-
High availability and graceful handling of clients that disconnect and reconnect.
Design the system, including:
-
Overall architecture and main services.
-
How clients maintain real time connections.
-
Data model for users, conversations, membership, and messages.
-
Storage of message content versus media attachments.
-
How to deliver messages reliably and in order per conversation.
-
How to implement message deletion.
4. Design a distributed CI or CD workflow system (GitHub Actions like)
Design a continuous integration and continuous delivery workflow engine similar to GitHub Actions.
Scenario
-
The system is integrated with a source code hosting service.
-
When repository events happen (for example git push, pull request opened, tag created), the system should automatically trigger predefined workflows.
-
Workflow definitions are stored as configuration files (for example YAML) in the repository itself.
Functional requirements
-
Detect repository events (for example a git push) and map them to workflows that should run.
-
For each trigger, create a workflow run that may contain multiple jobs and steps.
-
Schedule jobs to run on a distributed pool of workers (for example container based runners or VMs).
-
Manage the lifecycle of each workflow run:
-
Queueing, execution, retries, timeouts, cancellation.
-
Collection and storage of logs and artifacts.
-
Report status back to the source control system for display (for example success, failure, in progress).
Non functional requirements
-
Multi tenant system: many organizations and repositories.
-
Scalable to tens or hundreds of thousands of concurrent workflow runs.
-
Fair scheduling across projects and organizations with rate limiting.
-
Strong isolation between workflows for security.
Design the architecture, including:
-
Event ingestion from the git service.
-
How to find and parse the workflow configuration from the repository commit that triggered it.
-
The orchestration layer that manages workflow state and dependencies between jobs.
-
The worker or runner infrastructure that executes jobs.
-
Storage for workflow runs, logs, and artifacts.
-
How you scale, ensure reliability, and handle failures.
5. Design a payment processor service
Design a simplified but realistic payment processing system.
Functional requirements
-
Merchants integrate with your service via an API to process payments.
-
For a payment request, the system should:
-
Receive a payment initiation request from the merchant (for example charge a card or digital wallet).
-
Forward the request to an appropriate downstream payment gateway or processor (for example card network, bank, third party provider).
-
Receive the response from the downstream processor.
-
Return a result to the merchant with a clear status such as authorized, declined, or error.
-
Support related operations such as refund or capture can be mentioned.
Non functional requirements
-
Strong security and compliance, including safe handling of sensitive payment data.
-
High availability and low latency for the synchronous part of payment authorization.
-
Exactly once or at least once behavior that avoids double charging the customer, even under retries.
-
Auditability: all payment events must be logged and traceable; a ledger or transaction history must be maintained.
Design the overall architecture, including:
-
External API design for merchants and how you handle authentication and idempotency.
-
Core services involved in routing requests to different payment processors.
-
Data model for payments, merchants, and transaction history.
-
How you integrate with external payment gateways and handle their failures.
-
How you maintain a reliable ledger, support reconciliation, and ensure consistency.
-
How you address security and compliance concerns at a high level.