How do I approach System Design interview questions?

System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master system design interviews.

What difficulty level is this interview question?

This is a hard difficulty System Design question, commonly asked during Technical Screen rounds at Retell.

What role is this question designed for?

This question is commonly asked for Software Engineer candidates at Retell during technical interviews.

Design a Cloud Call-Center Platform for Programmatic Outbound Calls

Q: Design a Cloud Call-Center Platform for Programmatic Outbound Calls

This question evaluates a candidate's ability to design a distributed, asynchronous system under hard external rate and capacity constraints. It tests system design fundamentals such as queueing, backpressure, state-machine modeling, and multi-tenant fairness, commonly asked to assess architectural reasoning at a practical, applied level rather than pure theory.

A SaaS company offers a programmatic calling platform. Business customers (enterprises) integrate through an API: they submit a request to place an outbound phone call to one of their end users — for example an appointment reminder, a verification call, or a conversation handled by an automated voice agent. Your platform receives the request, sets up the call through a telecom carrier (PSTN/SIP), rings the end user's phone, and — once the user answers — bridges the call to a media endpoint (a voice agent or an audio stream).

Design this platform end to end: the request path, how calls are placed and tracked, and how the system stays stable under load. (This is an open-ended design where the interviewer expects you to drive: clarify, sketch a high-level design, then dive deep. The hardest sub-problem is scale, so budget time for it.)

Constraints & Assumptions

Enterprises submit call requests over an HTTPS API; each request targets one destination phone number and references a media configuration (which voice agent / audio to play once answered).
Calls leave the platform over one or more telecom carriers via SIP trunks. Each carrier enforces a maximum number of simultaneous channels (concurrent calls) and a per-second call-attempt rate (CPS). These are hard external limits you cannot exceed.
Assume a few hundred enterprises, average call duration of 2-5 minutes, and bursty traffic: a campaign kickoff can produce thousands of call requests within a few seconds.
A call moves through a lifecycle: queued → dialing → ringing → in_progress → completed | failed | no_answer | busy .
The platform reports per-call status back to the enterprise via webhooks and a status API.
Time from request acceptance to dial-out should be low (seconds) for immediate calls, but it is acceptable to queue/pace calls during bursts.

Clarifying Questions to Ask

Are calls immediate, scheduled, or both? Is there a notion of campaigns with priorities, or only one-off calls?
How many carriers do we integrate with, and what are their concurrency and CPS limits? Single carrier, or multi-carrier with failover / least-cost routing?
What happens on the answered leg — play static audio, bridge to a human, or connect to a real-time AI voice agent? Do we need bidirectional streaming audio?
What per-enterprise rate limits and quotas apply, and do we throttle to protect both the carriers and ourselves?
What delivery guarantees do enterprises expect for status — at-least-once webhooks with retries, idempotency, ordering?
Are there compliance constraints (consent, do-not-call lists, allowed calling windows by recipient timezone)?

Part 1 — High-level design: the outbound call lifecycle

Walk through the end-to-end path from "an enterprise calls our API to place a call" to "the end user's phone rings and the call is bridged to media." Define the major services, the data model for a call, and the API surface.

What This Part Should Cover Premium

Part 2 — Scaling a burst of concurrent call requests (the critical bottleneck)

Many enterprises launch campaigns at once and submit a large burst of call requests in a short window — far more than the carriers can dial simultaneously. How do you keep the platform stable, respect carrier limits, and stay fair across tenants?

Clarifying Questions for this Part

Is it acceptable to delay/queue calls during a burst, or must latency-sensitive calls (e.g., one-time-passcode verification) preempt bulk campaign traffic?
Do we have multiple carriers we can spread load across, and can we add carrier capacity dynamically?

What This Part Should Cover Premium

Part 3 — Reliability, call state, and observability

Calls and carriers fail in messy ways: no-answer, busy, carrier 5xx, dropped media, or a worker crashing mid-call. How do you track call state authoritatively, handle retries safely, and observe the system?

What This Part Should Cover Premium

What a Strong Answer Covers Premium

Follow-up Questions

How would you enforce a per-carrier concurrency budget across a distributed dialer fleet — a shared atomic counter, leases, or sharding destination numbers to specific workers?
A carrier starts returning elevated failures and latency. How does the system detect this and shift traffic to another carrier without a thundering-herd of retries?
How do you support scheduled campaigns and calling-window compliance (e.g., never dial before 9am in the recipient's local timezone)?
How would you add the real-time audio leg (bridging an answered call to an AI voice agent), and what new scaling constraints does live media processing introduce?

Constraints & Assumptions

Enterprises submit call requests over an HTTPS API; each request targets one destination phone number and references a media configuration (which voice agent / audio to play once answered).
Calls leave the platform over one or more telecom carriers via SIP trunks. Each carrier enforces a maximum number of simultaneous channels (concurrent calls) and a per-second call-attempt rate (CPS). These are hard external limits you cannot exceed.
Assume a few hundred enterprises, average call duration of 2-5 minutes, and bursty traffic: a campaign kickoff can produce thousands of call requests within a few seconds.
A call moves through a lifecycle: queued → dialing → ringing → in_progress → completed | failed | no_answer | busy .
The platform reports per-call status back to the enterprise via webhooks and a status API.
Time from request acceptance to dial-out should be low (seconds) for immediate calls, but it is acceptable to queue/pace calls during bursts.

Clarifying Questions to Ask

Are calls immediate, scheduled, or both? Is there a notion of campaigns with priorities, or only one-off calls?
How many carriers do we integrate with, and what are their concurrency and CPS limits? Single carrier, or multi-carrier with failover / least-cost routing?
What happens on the answered leg — play static audio, bridge to a human, or connect to a real-time AI voice agent? Do we need bidirectional streaming audio?
What per-enterprise rate limits and quotas apply, and do we throttle to protect both the carriers and ourselves?
What delivery guarantees do enterprises expect for status — at-least-once webhooks with retries, idempotency, ordering?
Are there compliance constraints (consent, do-not-call lists, allowed calling windows by recipient timezone)?

Part 1 — High-level design: the outbound call lifecycle

What This Part Should Cover Premium

Part 2 — Scaling a burst of concurrent call requests (the critical bottleneck)

Clarifying Questions for this Part

Is it acceptable to delay/queue calls during a burst, or must latency-sensitive calls (e.g., one-time-passcode verification) preempt bulk campaign traffic?
Do we have multiple carriers we can spread load across, and can we add carrier capacity dynamically?

What This Part Should Cover Premium

Part 3 — Reliability, call state, and observability

What This Part Should Cover Premium

What a Strong Answer Covers Premium

Follow-up Questions

How would you enforce a per-carrier concurrency budget across a distributed dialer fleet — a shared atomic counter, leases, or sharding destination numbers to specific workers?
A carrier starts returning elevated failures and latency. How does the system detect this and shift traffic to another carrier without a thundering-herd of retries?
How do you support scheduled campaigns and calling-window compliance (e.g., never dial before 9am in the recipient's local timezone)?
How would you add the real-time audio leg (bridging an answered call to an AI voice agent), and what new scaling constraints does live media processing introduce?

Design a Cloud Call-Center Platform for Programmatic Outbound Calls

Quick Overview

Design a Cloud Call-Center Platform for Programmatic Outbound Calls

Constraints & Assumptions

Clarifying Questions to Ask

Part 1 — High-level design: the outbound call lifecycle

What This Part Should Cover Premium

Part 2 — Scaling a burst of concurrent call requests (the critical bottleneck)

Clarifying Questions for this Part

What This Part Should Cover Premium

Part 3 — Reliability, call state, and observability

What This Part Should Cover Premium

What a Strong Answer Covers Premium

Follow-up Questions

Submit Your Answer to Earn 20XP

Design a Cloud Call-Center Platform for Programmatic Outbound Calls

Quick Overview

Design a Cloud Call-Center Platform for Programmatic Outbound Calls

Constraints & Assumptions

Clarifying Questions to Ask

Part 1 — High-level design: the outbound call lifecycle

What This Part Should Cover Premium

Part 2 — Scaling a burst of concurrent call requests (the critical bottleneck)

Clarifying Questions for this Part

What This Part Should Cover Premium

Part 3 — Reliability, call state, and observability

What This Part Should Cover Premium

What a Strong Answer Covers Premium

Follow-up Questions

Submit Your Answer to Earn 20XP