PracHub
QuestionsCoachesLearningGuidesInterview Prep
|Home/System Design/Retell

Design a Cloud Call-Center Platform for Programmatic Outbound Calls

Last updated: Jul 1, 2026

Quick Overview

This question evaluates a candidate's ability to design a distributed, asynchronous system under hard external rate and capacity constraints. It tests system design fundamentals such as queueing, backpressure, state-machine modeling, and multi-tenant fairness, commonly asked to assess architectural reasoning at a practical, applied level rather than pure theory.

  • hard
  • Retell
  • System Design
  • Software Engineer

Design a Cloud Call-Center Platform for Programmatic Outbound Calls

Company: Retell

Role: Software Engineer

Category: System Design

Difficulty: hard

Interview Round: Technical Screen

A SaaS company offers a programmatic calling platform. Business customers (enterprises) integrate through an API: they submit a request to place an outbound phone call to one of their end users — for example an appointment reminder, a verification call, or a conversation handled by an automated voice agent. Your platform receives the request, sets up the call through a telecom carrier (PSTN/SIP), rings the end user's phone, and — once the user answers — bridges the call to a media endpoint (a voice agent or an audio stream). Design this platform end to end: the request path, how calls are placed and tracked, and how the system stays stable under load. (This is an open-ended design where the interviewer expects you to drive: clarify, sketch a high-level design, then dive deep. The hardest sub-problem is scale, so budget time for it.) ### Constraints & Assumptions - Enterprises submit call requests over an HTTPS API; each request targets one destination phone number and references a media configuration (which voice agent / audio to play once answered). - Calls leave the platform over one or more telecom carriers via SIP trunks. **Each carrier enforces a maximum number of simultaneous channels (concurrent calls) and a per-second call-attempt rate (CPS).** These are hard external limits you cannot exceed. - Assume a few hundred enterprises, average call duration of 2-5 minutes, and bursty traffic: a campaign kickoff can produce thousands of call requests within a few seconds. - A call moves through a lifecycle: `queued → dialing → ringing → in_progress → completed | failed | no_answer | busy`. - The platform reports per-call status back to the enterprise via webhooks and a status API. - Time from request acceptance to dial-out should be low (seconds) for immediate calls, but it is acceptable to queue/pace calls during bursts. ### Clarifying Questions to Ask - Are calls immediate, scheduled, or both? Is there a notion of campaigns with priorities, or only one-off calls? - How many carriers do we integrate with, and what are their concurrency and CPS limits? Single carrier, or multi-carrier with failover / least-cost routing? - What happens on the answered leg — play static audio, bridge to a human, or connect to a real-time AI voice agent? Do we need bidirectional streaming audio? - What per-enterprise rate limits and quotas apply, and do we throttle to protect both the carriers and ourselves? - What delivery guarantees do enterprises expect for status — at-least-once webhooks with retries, idempotency, ordering? - Are there compliance constraints (consent, do-not-call lists, allowed calling windows by recipient timezone)? ### Part 1 — High-level design: the outbound call lifecycle Walk through the end-to-end path from "an enterprise calls our API to place a call" to "the end user's phone rings and the call is bridged to media." Define the major services, the data model for a call, and the API surface. ```hint Where to start Split the **synchronous** "accept the request" path from the **asynchronous** "place and manage the call" path. The API should validate, durably enqueue a call job, and return immediately; a pool of dialer workers consumes jobs and drives the carrier integration. Don't place the call inside the request handler. ``` ```hint Components Think: API gateway/service → durable queue + call-state store → a dialer / call-orchestrator service that speaks SIP to carriers (typically through an SBC or a telephony layer) → a media/voice-agent leg for answered calls → a webhook dispatcher that pushes status back to enterprises. ``` #### What This Part Should Cover ```premium-lock What This Part Should Cover ``` ### Part 2 — Scaling a burst of concurrent call requests (the critical bottleneck) Many enterprises launch campaigns at once and submit a large burst of call requests in a short window — far more than the carriers can dial simultaneously. How do you keep the platform stable, respect carrier limits, and stay fair across tenants? ```hint The real constraint The bottleneck is **downstream telecom capacity** (max concurrent channels + CPS per carrier), not your application CPU. The system must *shape and pace* outflow to the carriers rather than try to dial everything at once. ``` ```hint Mechanisms Reach for a durable queue + per-carrier and per-tenant **rate limiting** (token/leaky bucket for CPS), a **concurrency budget** (a live active-call counter that gates new dial-outs against each carrier's channel limit), autoscaling dialer workers, **backpressure** on the API, and fair scheduling so one tenant's burst doesn't starve others. ``` #### Clarifying Questions for this Part - Is it acceptable to delay/queue calls during a burst, or must latency-sensitive calls (e.g., one-time-passcode verification) preempt bulk campaign traffic? - Do we have multiple carriers we can spread load across, and can we add carrier capacity dynamically? #### What This Part Should Cover ```premium-lock What This Part Should Cover ``` ### Part 3 — Reliability, call state, and observability Calls and carriers fail in messy ways: no-answer, busy, carrier 5xx, dropped media, or a worker crashing mid-call. How do you track call state authoritatively, handle retries safely, and observe the system? ```hint State + idempotency Treat each call as a persisted **state machine**, and make request creation **idempotent** (idempotency key) so client retries never double-dial. Carrier callbacks (ringing/answered/hangup) can arrive out of order, be duplicated, or be lost — reconcile them against your own authoritative state rather than trusting them blindly. ``` #### What This Part Should Cover ```premium-lock What This Part Should Cover ``` ### What a Strong Answer Covers ```premium-lock What a Strong Answer Covers ``` ### Follow-up Questions - How would you enforce a per-carrier concurrency budget across a *distributed* dialer fleet — a shared atomic counter, leases, or sharding destination numbers to specific workers? - A carrier starts returning elevated failures and latency. How does the system detect this and shift traffic to another carrier without a thundering-herd of retries? - How do you support scheduled campaigns and calling-window compliance (e.g., never dial before 9am in the recipient's local timezone)? - How would you add the real-time audio leg (bridging an answered call to an AI voice agent), and what new scaling constraints does live media processing introduce?

Quick Answer: This question evaluates a candidate's ability to design a distributed, asynchronous system under hard external rate and capacity constraints. It tests system design fundamentals such as queueing, backpressure, state-machine modeling, and multi-tenant fairness, commonly asked to assess architectural reasoning at a practical, applied level rather than pure theory.

Related Interview Questions

  • Design and Implement a Mini SQL Query Engine - Retell (hard)
|Home/System Design/Retell

Design a Cloud Call-Center Platform for Programmatic Outbound Calls

Retell logo
Retell
Jan 10, 2026, 12:00 AM
hardSoftware EngineerTechnical ScreenSystem Design
0
0

A SaaS company offers a programmatic calling platform. Business customers (enterprises) integrate through an API: they submit a request to place an outbound phone call to one of their end users — for example an appointment reminder, a verification call, or a conversation handled by an automated voice agent. Your platform receives the request, sets up the call through a telecom carrier (PSTN/SIP), rings the end user's phone, and — once the user answers — bridges the call to a media endpoint (a voice agent or an audio stream).

Design this platform end to end: the request path, how calls are placed and tracked, and how the system stays stable under load. (This is an open-ended design where the interviewer expects you to drive: clarify, sketch a high-level design, then dive deep. The hardest sub-problem is scale, so budget time for it.)

Constraints & Assumptions

  • Enterprises submit call requests over an HTTPS API; each request targets one destination phone number and references a media configuration (which voice agent / audio to play once answered).
  • Calls leave the platform over one or more telecom carriers via SIP trunks. Each carrier enforces a maximum number of simultaneous channels (concurrent calls) and a per-second call-attempt rate (CPS). These are hard external limits you cannot exceed.
  • Assume a few hundred enterprises, average call duration of 2-5 minutes, and bursty traffic: a campaign kickoff can produce thousands of call requests within a few seconds.
  • A call moves through a lifecycle: queued → dialing → ringing → in_progress → completed | failed | no_answer | busy .
  • The platform reports per-call status back to the enterprise via webhooks and a status API.
  • Time from request acceptance to dial-out should be low (seconds) for immediate calls, but it is acceptable to queue/pace calls during bursts.

Clarifying Questions to Ask

  • Are calls immediate, scheduled, or both? Is there a notion of campaigns with priorities, or only one-off calls?
  • How many carriers do we integrate with, and what are their concurrency and CPS limits? Single carrier, or multi-carrier with failover / least-cost routing?
  • What happens on the answered leg — play static audio, bridge to a human, or connect to a real-time AI voice agent? Do we need bidirectional streaming audio?
  • What per-enterprise rate limits and quotas apply, and do we throttle to protect both the carriers and ourselves?
  • What delivery guarantees do enterprises expect for status — at-least-once webhooks with retries, idempotency, ordering?
  • Are there compliance constraints (consent, do-not-call lists, allowed calling windows by recipient timezone)?

Part 1 — High-level design: the outbound call lifecycle

Walk through the end-to-end path from "an enterprise calls our API to place a call" to "the end user's phone rings and the call is bridged to media." Define the major services, the data model for a call, and the API surface.

What This Part Should Cover Premium

Part 2 — Scaling a burst of concurrent call requests (the critical bottleneck)

Many enterprises launch campaigns at once and submit a large burst of call requests in a short window — far more than the carriers can dial simultaneously. How do you keep the platform stable, respect carrier limits, and stay fair across tenants?

Clarifying Questions for this Part

  • Is it acceptable to delay/queue calls during a burst, or must latency-sensitive calls (e.g., one-time-passcode verification) preempt bulk campaign traffic?
  • Do we have multiple carriers we can spread load across, and can we add carrier capacity dynamically?

What This Part Should Cover Premium

Part 3 — Reliability, call state, and observability

Calls and carriers fail in messy ways: no-answer, busy, carrier 5xx, dropped media, or a worker crashing mid-call. How do you track call state authoritatively, handle retries safely, and observe the system?

What This Part Should Cover Premium

What a Strong Answer Covers Premium

Follow-up Questions

  • How would you enforce a per-carrier concurrency budget across a distributed dialer fleet — a shared atomic counter, leases, or sharding destination numbers to specific workers?
  • A carrier starts returning elevated failures and latency. How does the system detect this and shift traffic to another carrier without a thundering-herd of retries?
  • How do you support scheduled campaigns and calling-window compliance (e.g., never dial before 9am in the recipient's local timezone)?
  • How would you add the real-time audio leg (bridging an answered call to an AI voice agent), and what new scaling constraints does live media processing introduce?

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More System Design•More Retell•More Software Engineer•Retell Software Engineer•Retell System Design•Software Engineer System Design

Your design canvas — auto-saved

PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • AI Coding Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.