Design webhook, POI, chat, CI/CD, payments
Company: OpenAI
Role: Software Engineer
Category: System Design
Difficulty: medium
Interview Round: Onsite
You are asked to design several large scale backend systems. For each one, describe:
- Core requirements and assumptions
- High level architecture and main components
- Data model and storage choices
- How you will meet scale, reliability, and latency requirements
- How you will handle failures, retries, and consistency
---
### 1. Design a webhook delivery service
Design a webhook service that allows users to register callbacks for specific events.
**Functional requirements**
- Clients can register a subscription by providing:
- eventId (string or numeric)
- callback URL (HTTP endpoint)
- The mapping eventId to callback URL is unique: each eventId triggers exactly one callback URL.
- When an event with a given eventId is triggered, the system must send an HTTP request to the registered callback URL containing a payload with event data.
- The system should provide visibility into delivery status (success, failure, retry attempts) per event.
**Non functional requirements**
- Scale: up to 1 billion events per day.
- Rough ballpark: about 11.6k events per second on average, with traffic spikes.
- High availability and durability of events.
- Delivery guarantees: at least once delivery is acceptable; exactly once can be achieved with idempotency at the receiver.
- Low but not real time latency: it is acceptable if callbacks are delivered within a few seconds.
- Multitenant and secure: only authorized clients can create or manage their subscriptions.
Design the full system: APIs, storage, event ingestion, dispatch workers, retry logic, monitoring, and how you would scale this to 1B events per day.
---
### 2. Design a places of interest search service (Yelp or Foursquare like)
Design a service that manages and searches places of interest around the world.
**Data model**
- Each place has at least:
- id
- name
- location: latitude and longitude
- place type: for example restaurant, hotel, park, etc.
- Optional metadata like rating, address, opening hours.
**Functional requirements**
- Support registering, updating, and removing places.
- Support search queries:
- Given a user location (latitude, longitude) and a place type, find the N nearest places.
- Optionally support a radius or maximum distance.
- Sort primarily by distance; you can mention additional ranking signals like rating.
**Non functional requirements**
- Data scale: on the order of hundreds of millions of places globally.
- High read throughput with low latency search (for example p95 under 100 ms for queries).
- Updates to places do not need to be visible in real time; eventual consistency is acceptable.
Design the system, including:
- How you store place data.
- How you index geospatial locations to support efficient nearby search.
- How you handle filtering by place type and other attributes.
- How you shard and replicate data globally.
- How you handle hotspots (for example city centers) and caching.
---
### 3. Design a Slack like chat system
Design a messaging system similar to Slack with support for individual and group chats.
**Functional requirements**
- Users can send messages to:
- Another individual user (one to one chat).
- A group chat with multiple members.
- Users can:
- Create new group chats.
- Add or remove users from group chats (assume permissions can be simplified).
- The system delivers notifications to recipients when new messages arrive.
- Messages can contain rich media such as images; large media content should be stored efficiently.
- Users can delete a message they previously sent.
- Clarify semantics: delete for everyone in the conversation, or only hide for the deleting user.
**Non functional requirements**
- Support a large number of concurrent users and active conversations.
- Low latency message delivery (near real time for online users).
- Message history should be stored and queryable (for example loading chat history when opening a conversation).
- High availability and graceful handling of clients that disconnect and reconnect.
Design the system, including:
- Overall architecture and main services.
- How clients maintain real time connections.
- Data model for users, conversations, membership, and messages.
- Storage of message content versus media attachments.
- How to deliver messages reliably and in order per conversation.
- How to implement message deletion.
---
### 4. Design a distributed CI or CD workflow system (GitHub Actions like)
Design a continuous integration and continuous delivery workflow engine similar to GitHub Actions.
**Scenario**
- The system is integrated with a source code hosting service.
- When repository events happen (for example git push, pull request opened, tag created), the system should automatically trigger predefined workflows.
- Workflow definitions are stored as configuration files (for example YAML) in the repository itself.
**Functional requirements**
- Detect repository events (for example a git push) and map them to workflows that should run.
- For each trigger, create a workflow run that may contain multiple jobs and steps.
- Schedule jobs to run on a distributed pool of workers (for example container based runners or VMs).
- Manage the lifecycle of each workflow run:
- Queueing, execution, retries, timeouts, cancellation.
- Collection and storage of logs and artifacts.
- Report status back to the source control system for display (for example success, failure, in progress).
**Non functional requirements**
- Multi tenant system: many organizations and repositories.
- Scalable to tens or hundreds of thousands of concurrent workflow runs.
- Fair scheduling across projects and organizations with rate limiting.
- Strong isolation between workflows for security.
Design the architecture, including:
- Event ingestion from the git service.
- How to find and parse the workflow configuration from the repository commit that triggered it.
- The orchestration layer that manages workflow state and dependencies between jobs.
- The worker or runner infrastructure that executes jobs.
- Storage for workflow runs, logs, and artifacts.
- How you scale, ensure reliability, and handle failures.
---
### 5. Design a payment processor service
Design a simplified but realistic payment processing system.
**Functional requirements**
- Merchants integrate with your service via an API to process payments.
- For a payment request, the system should:
- Receive a payment initiation request from the merchant (for example charge a card or digital wallet).
- Forward the request to an appropriate downstream payment gateway or processor (for example card network, bank, third party provider).
- Receive the response from the downstream processor.
- Return a result to the merchant with a clear status such as authorized, declined, or error.
- Support related operations such as refund or capture can be mentioned.
**Non functional requirements**
- Strong security and compliance, including safe handling of sensitive payment data.
- High availability and low latency for the synchronous part of payment authorization.
- Exactly once or at least once behavior that avoids double charging the customer, even under retries.
- Auditability: all payment events must be logged and traceable; a ledger or transaction history must be maintained.
Design the overall architecture, including:
- External API design for merchants and how you handle authentication and idempotency.
- Core services involved in routing requests to different payment processors.
- Data model for payments, merchants, and transaction history.
- How you integrate with external payment gateways and handle their failures.
- How you maintain a reliable ledger, support reconciliation, and ensure consistency.
- How you address security and compliance concerns at a high level.