PracHub
QuestionsCoachesLearningGuidesInterview Prep
|Home/System Design/Apple

Design a Centralized Logging System

Last updated: Jul 1, 2026

Quick Overview

This question evaluates a candidate's ability to design a large-scale, distributed logging system covering ingestion, storage, indexing, and query serving. It tests system design fundamentals such as decoupling producers from consumers, partitioning and tiered storage, and reliability trade-offs under high write throughput. This type of prompt is common in system design interviews to assess practical, application-level architectural reasoning rather than purely conceptual knowledge.

  • medium
  • Apple
  • System Design
  • Software Engineer

Design a Centralized Logging System

Company: Apple

Role: Software Engineer

Category: System Design

Difficulty: medium

Interview Round: Onsite

# Design a Centralized Logging System Design a system that collects application logs from a large fleet of microservice instances and makes them durable and searchable for debugging and operational monitoring. Engineers should be able to find a service's logs by time range and search the log message, within seconds of the log being emitted. The interviewer will deep-dive four areas: the **ingestion pipeline**, the **storage and query design**, and how the system stays **reliable** and **scalable** under load and failure. ### Constraints & Assumptions State your own numbers; reasonable starting assumptions: - ~10,000 service instances across many services. - Aggregate write volume on the order of ~1,000,000 log lines/sec at peak, average ~500 bytes/line, so roughly **~500 MB/s (~40 TB/day)** of raw logs. (Pick numbers and let them drive the design.) - Logs are mostly write-once, read-rarely; reads are bursty during incidents. - Retention: a few days to weeks "hot" (fast search), older logs archived cheaply. - Query patterns: filter by service / host / level, restrict to a time range, and full-text search on the message; target p99 search latency of a few seconds. - Near-real-time: a log should be queryable within a few seconds of emission. ### Clarifying Questions to Ask - What is the expected log volume and average line size, and how spiky is it? - Are logs structured (JSON with fields) or free-form text, or a mix? - What retention is required, and is there a compliance/PII constraint on storage and access? - What are the dominant query patterns — full-text search, field filters, metrics/aggregations, or alerting? - How strict are ordering and delivery guarantees — is at-least-once with possible duplicates acceptable, or is exactly-once required? - What is the acceptable end-to-end ingestion latency? ### Part 1 — Ingestion pipeline Design how logs get from each service instance into the system reliably and at high throughput, with minimal impact on the services themselves. ```hint Decouple producers from consumers Put a durable, partitioned transport log (e.g., Kafka) between the collection agents and the downstream processors so producers never block on slow consumers and traffic spikes are absorbed by the buffer. ``` ```hint At the edge Run a lightweight agent/sidecar per host that tails log files, batches and compresses lines, and buffers to local disk so a transient outage downstream does not drop logs or block the app. ``` #### What This Part Should Cover ```premium-lock What This Part Should Cover ``` ### Part 2 — Storage and query design Design how logs are stored so they are both cheap to retain and fast to search for the required query patterns. ```hint Two stores, two jobs Separate the cheap, immutable raw store (object storage like S3/GCS) from a query index. Index only the fields you actually filter/search on rather than indexing everything. ``` ```hint Partition by time Time-partition indices (e.g., per-hour/day, per-service) so old data rolls off cheaply and queries prune to the relevant shards. Use hot/warm/cold tiers to balance cost vs latency. ``` #### What This Part Should Cover ```premium-lock What This Part Should Cover ``` ### Part 3 — Reliability and scalability Design for no (or bounded) data loss, horizontal scaling of every stage, and graceful behavior under failure and overload. ```hint Durability and dedup Replicate the buffer (replication factor, producer acks) for durability; with at-least-once delivery, make writes idempotent or dedup on a stable event id so retries do not double-count. ``` ```hint Degrade, do not collapse Under overload, shed load deliberately — sample or drop low-severity logs and apply backpressure — rather than letting the pipeline cascade into failure. Monitor consumer lag as your primary health signal. ``` #### What This Part Should Cover ```premium-lock What This Part Should Cover ``` ### What a Strong Answer Covers ```premium-lock What a Strong Answer Covers ``` ### Follow-up Questions - Is exactly-once worth the cost for logs, or is at-least-once plus best-effort dedup sufficient? How would you implement dedup? - How do you handle multi-line entries (stack traces) and a mix of structured JSON and plain text during parsing? - How would you support near-real-time alerting on log patterns (e.g., a sudden spike in error rate)? - How would you run this across multiple regions — ingest locally but query globally?

Quick Answer: This question evaluates a candidate's ability to design a large-scale, distributed logging system covering ingestion, storage, indexing, and query serving. It tests system design fundamentals such as decoupling producers from consumers, partitioning and tiered storage, and reliability trade-offs under high write throughput. This type of prompt is common in system design interviews to assess practical, application-level architectural reasoning rather than purely conceptual knowledge.

Related Interview Questions

  • Object-Oriented Design: Hotel Room Booking System - Apple (medium)
  • Design a smartwatch sensor subsystem - Apple (hard)
  • Design CI/CD for AI Services - Apple (medium)
  • Design TikTok Data Engineering Systems - Apple (medium)
  • Design ad click aggregator and file sync service - Apple (medium)
|Home/System Design/Apple

Design a Centralized Logging System

Apple logo
Apple
Jun 30, 2026, 12:00 AM
mediumSoftware EngineerOnsiteSystem Design
0
0

Design a Centralized Logging System

Design a system that collects application logs from a large fleet of microservice instances and makes them durable and searchable for debugging and operational monitoring. Engineers should be able to find a service's logs by time range and search the log message, within seconds of the log being emitted.

The interviewer will deep-dive four areas: the ingestion pipeline, the storage and query design, and how the system stays reliable and scalable under load and failure.

Constraints & Assumptions

State your own numbers; reasonable starting assumptions:

  • ~10,000 service instances across many services.
  • Aggregate write volume on the order of ~1,000,000 log lines/sec at peak, average ~500 bytes/line, so roughly ~500 MB/s (~40 TB/day) of raw logs. (Pick numbers and let them drive the design.)
  • Logs are mostly write-once, read-rarely; reads are bursty during incidents.
  • Retention: a few days to weeks "hot" (fast search), older logs archived cheaply.
  • Query patterns: filter by service / host / level, restrict to a time range, and full-text search on the message; target p99 search latency of a few seconds.
  • Near-real-time: a log should be queryable within a few seconds of emission.

Clarifying Questions to Ask

  • What is the expected log volume and average line size, and how spiky is it?
  • Are logs structured (JSON with fields) or free-form text, or a mix?
  • What retention is required, and is there a compliance/PII constraint on storage and access?
  • What are the dominant query patterns — full-text search, field filters, metrics/aggregations, or alerting?
  • How strict are ordering and delivery guarantees — is at-least-once with possible duplicates acceptable, or is exactly-once required?
  • What is the acceptable end-to-end ingestion latency?

Part 1 — Ingestion pipeline

Design how logs get from each service instance into the system reliably and at high throughput, with minimal impact on the services themselves.

What This Part Should Cover Premium

Part 2 — Storage and query design

Design how logs are stored so they are both cheap to retain and fast to search for the required query patterns.

What This Part Should Cover Premium

Part 3 — Reliability and scalability

Design for no (or bounded) data loss, horizontal scaling of every stage, and graceful behavior under failure and overload.

What This Part Should Cover Premium

What a Strong Answer Covers Premium

Follow-up Questions

  • Is exactly-once worth the cost for logs, or is at-least-once plus best-effort dedup sufficient? How would you implement dedup?
  • How do you handle multi-line entries (stack traces) and a mix of structured JSON and plain text during parsing?
  • How would you support near-real-time alerting on log patterns (e.g., a sudden spike in error rate)?
  • How would you run this across multiple regions — ingest locally but query globally?

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More System Design•More Apple•More Software Engineer•Apple Software Engineer•Apple System Design•Software Engineer System Design

Your design canvas — auto-saved

PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • AI Coding Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.