How do I approach System Design interview questions?

System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master system design interviews.

What difficulty level is this interview question?

This is a medium difficulty System Design question, commonly asked during Onsite rounds at Google.

What role is this question designed for?

This question is commonly asked for Software Engineer candidates at Google during technical interviews.

Design an Online Coding Judge Platform | Google Interview Question

Q: Design an Online Coding Judge Platform

This question evaluates a candidate's competencies in large-scale system design, secure multi-tenant execution and sandboxing, capacity planning, queuing and autoscaling, and trade-off reasoning across performance, cost, and operational complexity.

Design an Online Coding Judge Platform

Design an online coding-practice and judging platform (similar to LeetCode or Codeforces).

Users should be able to browse a catalog of programming problems, write and submit source code in multiple languages, and have that code compiled and executed against both sample (visible) and hidden test cases. For each submission the system returns a verdict — Accepted, Wrong Answer, Compile Error, Runtime Error, Memory Limit Exceeded, or Time Limit Exceeded — along with runtime and memory usage, and users can review their full submission history.

The submitted code is untrusted, so secure isolation of execution is a first-class requirement, not an afterthought. Produce an end-to-end design that covers: functional and non-functional requirements, an MVP, capacity estimates with a bottleneck call-out, the API surface, the data model, the high-level architecture and submission flow, the sandboxing/security model, how judging scales from one machine to many, and the key trade-offs (consistency, latency, cost, operational complexity). Expect the interviewer to keep changing requirements and to probe where the bottleneck is — be ready to argue whether the system is CPU-, memory-, disk-, network-, or queue-bound.

Constraints & Assumptions

State your own numbers, but a reasonable working set is:

~1M registered users; ~100K daily active users.
Steady-state submissions are modest, but contest / recruiting spikes drive peaks of ~2,000 submissions/sec.
Per-problem time limit (e.g. 1–10 s wall/CPU) and memory limit (e.g. 256 MB) are configurable; the judge must enforce both.
Supported languages include at least C++, Java, Python, and Go (compiled and interpreted).
Submitted code is untrusted and adversarial — assume it will attempt to fork-bomb, fill the disk, open sockets, read another problem's hidden tests, or escape the sandbox.
Submission verdicts may be eventually consistent — a few seconds of latency to a final verdict is acceptable; strict synchronous completion is not required.
Problem statements are read-heavy and highly cacheable; submissions are write/compute-heavy.

Clarifying Questions to Ask

Is this for everyday practice, timed competitive contests, or recruiting assessments? (Determines the spike profile and fairness/anti-cheat needs.)
Do we need interactive/streaming problems and custom/special judges (floating-point tolerance, multiple correct answers), or only batch "run all test cases" problems?
How large can a single problem's hidden test set be (KB vs. hundreds of MB)? This drives the test-data distribution strategy.
Are partial scores per test group required, or is it all-or-nothing per submission?
What languages and runtime versions must we support on day one?
What is the acceptable latency from "submit" to "final verdict" under normal load vs. peak, and must sample runs feel near-instant?

What a Strong Answer Covers

Requirements framing : clear functional vs. non-functional split, with untrusted-code isolation called out as a top non-functional driver.
Async architecture : submission API → queue → judge worker pool, with status polling/push and a clear submission flow.
Back-of-envelope sizing : numeric estimates for both the compute side and the storage/write side, leading to a defended call on which resource is the binding constraint — and the awareness to re-argue it when load shifts to memory, disk, or network.
Sandboxing depth : concrete isolation mechanisms (no network, read-only FS, cgroup CPU/memory caps, process/thread limits, wall + CPU timeouts, unprivileged user, ephemeral environment) and the trade-off between plain containers and stronger boundaries (microVM / syscall-intercepting runtime).
Data model & API that separate problem metadata, large test data (object storage), and submission records (partitionable).
Scaling strategy per plane: cache/CDN for reads; queue-depth autoscaling, test-data caching, and per-resource-class worker pools for judging.
Reliability : idempotent jobs, worker heartbeats/requeue, a submission state machine, retry-on-infra-failure-only.
Explicit trade-offs with a defended default choice, plus what to observe (queue depth, judge latency, sandbox-escape alerts).
Communication : scopes quickly, surfaces 2–3 alternatives with pros/cons, and adapts as requirements change instead of monologuing.

Follow-up Questions

How do you distribute and cache hundreds of MB of hidden test data so workers don't re-download it on every run, while still allowing test-set updates?
A submission gets Accepted on a worker but the result write fails before it's persisted. Walk through how your design avoids both a lost verdict and a double-judge.
During a contest, one problem with a 10-second time limit starves all the fast jobs in a shared queue. How do you keep short submissions responsive?
How would you make judging deterministic and fair (consistent CPU-time measurement across heterogeneous hardware, anti-cheat, reproducible verdicts)?

Design an Online Coding Judge Platform

Design an online coding-practice and judging platform (similar to LeetCode or Codeforces).

Constraints & Assumptions

State your own numbers, but a reasonable working set is:

~1M registered users; ~100K daily active users.
Steady-state submissions are modest, but contest / recruiting spikes drive peaks of ~2,000 submissions/sec.
Per-problem time limit (e.g. 1–10 s wall/CPU) and memory limit (e.g. 256 MB) are configurable; the judge must enforce both.
Supported languages include at least C++, Java, Python, and Go (compiled and interpreted).
Submitted code is untrusted and adversarial — assume it will attempt to fork-bomb, fill the disk, open sockets, read another problem's hidden tests, or escape the sandbox.
Submission verdicts may be eventually consistent — a few seconds of latency to a final verdict is acceptable; strict synchronous completion is not required.
Problem statements are read-heavy and highly cacheable; submissions are write/compute-heavy.

Clarifying Questions to Ask

Is this for everyday practice, timed competitive contests, or recruiting assessments? (Determines the spike profile and fairness/anti-cheat needs.)
Do we need interactive/streaming problems and custom/special judges (floating-point tolerance, multiple correct answers), or only batch "run all test cases" problems?
How large can a single problem's hidden test set be (KB vs. hundreds of MB)? This drives the test-data distribution strategy.
Are partial scores per test group required, or is it all-or-nothing per submission?
What languages and runtime versions must we support on day one?
What is the acceptable latency from "submit" to "final verdict" under normal load vs. peak, and must sample runs feel near-instant?

What a Strong Answer Covers

Requirements framing : clear functional vs. non-functional split, with untrusted-code isolation called out as a top non-functional driver.
Async architecture : submission API → queue → judge worker pool, with status polling/push and a clear submission flow.
Back-of-envelope sizing : numeric estimates for both the compute side and the storage/write side, leading to a defended call on which resource is the binding constraint — and the awareness to re-argue it when load shifts to memory, disk, or network.
Sandboxing depth : concrete isolation mechanisms (no network, read-only FS, cgroup CPU/memory caps, process/thread limits, wall + CPU timeouts, unprivileged user, ephemeral environment) and the trade-off between plain containers and stronger boundaries (microVM / syscall-intercepting runtime).
Data model & API that separate problem metadata, large test data (object storage), and submission records (partitionable).
Scaling strategy per plane: cache/CDN for reads; queue-depth autoscaling, test-data caching, and per-resource-class worker pools for judging.
Reliability : idempotent jobs, worker heartbeats/requeue, a submission state machine, retry-on-infra-failure-only.
Explicit trade-offs with a defended default choice, plus what to observe (queue depth, judge latency, sandbox-escape alerts).
Communication : scopes quickly, surfaces 2–3 alternatives with pros/cons, and adapts as requirements change instead of monologuing.

Follow-up Questions

How do you distribute and cache hundreds of MB of hidden test data so workers don't re-download it on every run, while still allowing test-set updates?
A submission gets Accepted on a worker but the result write fails before it's persisted. Walk through how your design avoids both a lost verdict and a double-judge.
During a contest, one problem with a 10-second time limit starves all the fast jobs in a shared queue. How do you keep short submissions responsive?
How would you make judging deterministic and fair (consistent CPU-time measurement across heterogeneous hardware, anti-cheat, reproducible verdicts)?

Design an Online Coding Judge Platform

Quick Overview

Design an Online Coding Judge Platform

Constraints & Assumptions

Clarifying Questions to Ask

What a Strong Answer Covers

Follow-up Questions

Solution

Submit Your Answer to Earn 20XP

Design an Online Coding Judge Platform

Quick Overview

Design an Online Coding Judge Platform

Constraints & Assumptions

Clarifying Questions to Ask

What a Strong Answer Covers

Follow-up Questions

Solution

Submit Your Answer to Earn 20XP