How do I approach System Design interview questions?

System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master system design interviews.

What difficulty level is this interview question?

This is a medium difficulty System Design question, commonly asked during Onsite rounds at Apple.

What role is this question designed for?

This question is commonly asked for Software Engineer candidates at Apple during technical interviews.

Design a Centralized Logging System | Apple Interview Question

Q: Design a Centralized Logging System

This question evaluates a candidate's ability to design a large-scale, distributed logging system covering ingestion, storage, indexing, and query serving. It tests system design fundamentals such as decoupling producers from consumers, partitioning and tiered storage, and reliability trade-offs under high write throughput. This type of prompt is common in system design interviews to assess practical, application-level architectural reasoning rather than purely conceptual knowledge.

Design a Centralized Logging System

Design a system that collects application logs from a large fleet of microservice instances and makes them durable and searchable for debugging and operational monitoring. Engineers should be able to find a service's logs by time range and search the log message, within seconds of the log being emitted.

The interviewer will deep-dive four areas: the ingestion pipeline, the storage and query design, and how the system stays reliable and scalable under load and failure.

Constraints & Assumptions

State your own numbers; reasonable starting assumptions:

~10,000 service instances across many services.
Aggregate write volume on the order of ~1,000,000 log lines/sec at peak, average ~500 bytes/line, so roughly ~500 MB/s (~40 TB/day) of raw logs. (Pick numbers and let them drive the design.)
Logs are mostly write-once, read-rarely; reads are bursty during incidents.
Retention: a few days to weeks "hot" (fast search), older logs archived cheaply.
Query patterns: filter by service / host / level, restrict to a time range, and full-text search on the message; target p99 search latency of a few seconds.
Near-real-time: a log should be queryable within a few seconds of emission.

Clarifying Questions to Ask

What is the expected log volume and average line size, and how spiky is it?
Are logs structured (JSON with fields) or free-form text, or a mix?
What retention is required, and is there a compliance/PII constraint on storage and access?
What are the dominant query patterns — full-text search, field filters, metrics/aggregations, or alerting?
How strict are ordering and delivery guarantees — is at-least-once with possible duplicates acceptable, or is exactly-once required?
What is the acceptable end-to-end ingestion latency?

Part 1 — Ingestion pipeline

Design how logs get from each service instance into the system reliably and at high throughput, with minimal impact on the services themselves.

What This Part Should Cover Premium

Part 2 — Storage and query design

Design how logs are stored so they are both cheap to retain and fast to search for the required query patterns.

What This Part Should Cover Premium

Part 3 — Reliability and scalability

Design for no (or bounded) data loss, horizontal scaling of every stage, and graceful behavior under failure and overload.

What This Part Should Cover Premium

What a Strong Answer Covers Premium

Follow-up Questions

Is exactly-once worth the cost for logs, or is at-least-once plus best-effort dedup sufficient? How would you implement dedup?
How do you handle multi-line entries (stack traces) and a mix of structured JSON and plain text during parsing?
How would you support near-real-time alerting on log patterns (e.g., a sudden spike in error rate)?
How would you run this across multiple regions — ingest locally but query globally?

Design a Centralized Logging System

The interviewer will deep-dive four areas: the ingestion pipeline, the storage and query design, and how the system stays reliable and scalable under load and failure.

Constraints & Assumptions

State your own numbers; reasonable starting assumptions:

~10,000 service instances across many services.
Aggregate write volume on the order of ~1,000,000 log lines/sec at peak, average ~500 bytes/line, so roughly ~500 MB/s (~40 TB/day) of raw logs. (Pick numbers and let them drive the design.)
Logs are mostly write-once, read-rarely; reads are bursty during incidents.
Retention: a few days to weeks "hot" (fast search), older logs archived cheaply.
Query patterns: filter by service / host / level, restrict to a time range, and full-text search on the message; target p99 search latency of a few seconds.
Near-real-time: a log should be queryable within a few seconds of emission.

Clarifying Questions to Ask

What is the expected log volume and average line size, and how spiky is it?
Are logs structured (JSON with fields) or free-form text, or a mix?
What retention is required, and is there a compliance/PII constraint on storage and access?
What are the dominant query patterns — full-text search, field filters, metrics/aggregations, or alerting?
How strict are ordering and delivery guarantees — is at-least-once with possible duplicates acceptable, or is exactly-once required?
What is the acceptable end-to-end ingestion latency?

Part 1 — Ingestion pipeline

Design how logs get from each service instance into the system reliably and at high throughput, with minimal impact on the services themselves.

What This Part Should Cover Premium

Part 2 — Storage and query design

Design how logs are stored so they are both cheap to retain and fast to search for the required query patterns.

What This Part Should Cover Premium

Part 3 — Reliability and scalability

Design for no (or bounded) data loss, horizontal scaling of every stage, and graceful behavior under failure and overload.

What This Part Should Cover Premium

What a Strong Answer Covers Premium

Follow-up Questions

Is exactly-once worth the cost for logs, or is at-least-once plus best-effort dedup sufficient? How would you implement dedup?
How do you handle multi-line entries (stack traces) and a mix of structured JSON and plain text during parsing?
How would you support near-real-time alerting on log patterns (e.g., a sudden spike in error rate)?
How would you run this across multiple regions — ingest locally but query globally?

Design a Centralized Logging System

Quick Overview

Design a Centralized Logging System

Design a Centralized Logging System

Constraints & Assumptions

Clarifying Questions to Ask

Part 1 — Ingestion pipeline

What This Part Should Cover Premium

Part 2 — Storage and query design

What This Part Should Cover Premium

Part 3 — Reliability and scalability

What This Part Should Cover Premium

What a Strong Answer Covers Premium

Follow-up Questions

Submit Your Answer to Earn 20XP

Design a Centralized Logging System

Quick Overview

Design a Centralized Logging System

Design a Centralized Logging System

Constraints & Assumptions

Clarifying Questions to Ask

Part 1 — Ingestion pipeline

What This Part Should Cover Premium

Part 2 — Storage and query design

What This Part Should Cover Premium

Part 3 — Reliability and scalability

What This Part Should Cover Premium

What a Strong Answer Covers Premium

Follow-up Questions

Submit Your Answer to Earn 20XP