Design a Cloud Call-Center Platform for Programmatic Outbound Calls
Company: Retell
Role: Software Engineer
Category: System Design
Difficulty: hard
Interview Round: Technical Screen
A SaaS company offers a programmatic calling platform. Business customers (enterprises) integrate through an API: they submit a request to place an outbound phone call to one of their end users — for example an appointment reminder, a verification call, or a conversation handled by an automated voice agent. Your platform receives the request, sets up the call through a telecom carrier (PSTN/SIP), rings the end user's phone, and — once the user answers — bridges the call to a media endpoint (a voice agent or an audio stream).
Design this platform end to end: the request path, how calls are placed and tracked, and how the system stays stable under load. (This is an open-ended design where the interviewer expects you to drive: clarify, sketch a high-level design, then dive deep. The hardest sub-problem is scale, so budget time for it.)
### Constraints & Assumptions
- Enterprises submit call requests over an HTTPS API; each request targets one destination phone number and references a media configuration (which voice agent / audio to play once answered).
- Calls leave the platform over one or more telecom carriers via SIP trunks. **Each carrier enforces a maximum number of simultaneous channels (concurrent calls) and a per-second call-attempt rate (CPS).** These are hard external limits you cannot exceed.
- Assume a few hundred enterprises, average call duration of 2-5 minutes, and bursty traffic: a campaign kickoff can produce thousands of call requests within a few seconds.
- A call moves through a lifecycle: `queued → dialing → ringing → in_progress → completed | failed | no_answer | busy`.
- The platform reports per-call status back to the enterprise via webhooks and a status API.
- Time from request acceptance to dial-out should be low (seconds) for immediate calls, but it is acceptable to queue/pace calls during bursts.
### Clarifying Questions to Ask
- Are calls immediate, scheduled, or both? Is there a notion of campaigns with priorities, or only one-off calls?
- How many carriers do we integrate with, and what are their concurrency and CPS limits? Single carrier, or multi-carrier with failover / least-cost routing?
- What happens on the answered leg — play static audio, bridge to a human, or connect to a real-time AI voice agent? Do we need bidirectional streaming audio?
- What per-enterprise rate limits and quotas apply, and do we throttle to protect both the carriers and ourselves?
- What delivery guarantees do enterprises expect for status — at-least-once webhooks with retries, idempotency, ordering?
- Are there compliance constraints (consent, do-not-call lists, allowed calling windows by recipient timezone)?
### Part 1 — High-level design: the outbound call lifecycle
Walk through the end-to-end path from "an enterprise calls our API to place a call" to "the end user's phone rings and the call is bridged to media." Define the major services, the data model for a call, and the API surface.
```hint Where to start
Split the **synchronous** "accept the request" path from the **asynchronous** "place and manage the call" path. The API should validate, durably enqueue a call job, and return immediately; a pool of dialer workers consumes jobs and drives the carrier integration. Don't place the call inside the request handler.
```
```hint Components
Think: API gateway/service → durable queue + call-state store → a dialer / call-orchestrator service that speaks SIP to carriers (typically through an SBC or a telephony layer) → a media/voice-agent leg for answered calls → a webhook dispatcher that pushes status back to enterprises.
```
#### What This Part Should Cover
```premium-lock What This Part Should Cover
```
### Part 2 — Scaling a burst of concurrent call requests (the critical bottleneck)
Many enterprises launch campaigns at once and submit a large burst of call requests in a short window — far more than the carriers can dial simultaneously. How do you keep the platform stable, respect carrier limits, and stay fair across tenants?
```hint The real constraint
The bottleneck is **downstream telecom capacity** (max concurrent channels + CPS per carrier), not your application CPU. The system must *shape and pace* outflow to the carriers rather than try to dial everything at once.
```
```hint Mechanisms
Reach for a durable queue + per-carrier and per-tenant **rate limiting** (token/leaky bucket for CPS), a **concurrency budget** (a live active-call counter that gates new dial-outs against each carrier's channel limit), autoscaling dialer workers, **backpressure** on the API, and fair scheduling so one tenant's burst doesn't starve others.
```
#### Clarifying Questions for this Part
- Is it acceptable to delay/queue calls during a burst, or must latency-sensitive calls (e.g., one-time-passcode verification) preempt bulk campaign traffic?
- Do we have multiple carriers we can spread load across, and can we add carrier capacity dynamically?
#### What This Part Should Cover
```premium-lock What This Part Should Cover
```
### Part 3 — Reliability, call state, and observability
Calls and carriers fail in messy ways: no-answer, busy, carrier 5xx, dropped media, or a worker crashing mid-call. How do you track call state authoritatively, handle retries safely, and observe the system?
```hint State + idempotency
Treat each call as a persisted **state machine**, and make request creation **idempotent** (idempotency key) so client retries never double-dial. Carrier callbacks (ringing/answered/hangup) can arrive out of order, be duplicated, or be lost — reconcile them against your own authoritative state rather than trusting them blindly.
```
#### What This Part Should Cover
```premium-lock What This Part Should Cover
```
### What a Strong Answer Covers
```premium-lock What a Strong Answer Covers
```
### Follow-up Questions
- How would you enforce a per-carrier concurrency budget across a *distributed* dialer fleet — a shared atomic counter, leases, or sharding destination numbers to specific workers?
- A carrier starts returning elevated failures and latency. How does the system detect this and shift traffic to another carrier without a thundering-herd of retries?
- How do you support scheduled campaigns and calling-window compliance (e.g., never dial before 9am in the recipient's local timezone)?
- How would you add the real-time audio leg (bridging an answered call to an AI voice agent), and what new scaling constraints does live media processing introduce?
Quick Answer: This question evaluates a candidate's ability to design a distributed, asynchronous system under hard external rate and capacity constraints. It tests system design fundamentals such as queueing, backpressure, state-machine modeling, and multi-tenant fairness, commonly asked to assess architectural reasoning at a practical, applied level rather than pure theory.