Problem
Design a system to collect, transmit, store, and query application/data logs from many services/hosts into a central data center database.
Requirements
Functional
-
Services/hosts continuously generate log events.
-
Logs must be
transported to a central data center
and persisted.
-
Users must be able to
query logs from a centralized store
.
-
Support:
-
Near-real-time “log check”
(debugging/triage shortly after events happen).
-
Historical querying for up to
3 months
of retained logs.
Non-functional (clarify/assume if not given)
-
High write throughput and burst handling.
-
Query performance for both recent and 3-month historical data.
-
Reliability (no/limited data loss), ordering not strictly required unless specified.
-
Multi-tenant access control.
Deliverables
Describe:
-
End-to-end architecture (agents, transport, ingestion, storage, query).
-
Data model and indexing strategy.
-
Retention/TTL and tiered storage approach.
-
Failure handling (backpressure, retries, deduplication).
-
Monitoring/operational considerations.