Design an encrypted log collection system
Company: Cloudflare
Role: Software Engineer
Category: System Design
Difficulty: medium
Interview Round: Onsite
## Scenario
Design a **log collection system** that gathers logs/events from services running in **multiple edge data centers** and delivers them to a central platform.
### Requirements
- **Multi-region/edge ingestion:** Logs originate from many edge data centers.
- **Secure transport:** Logs must be **encrypted in transit** while being collected and forwarded.
- **(Typically expected) Secure storage:** Consider encryption at rest and access control.
- The initial prompt may be underspecified; you should clarify:
- Is this primarily **real-time streaming** (e.g., alerting) or **batch analytics** (warehouse/data lake), or both?
- Expected throughput (events/sec, peak factors), average event size, and retention period.
- SLA/SLOs: end-to-end latency, durability (acceptable loss), and availability.
- Compliance constraints (PII, GDPR/CCPA, data residency).
- Query patterns (search by trace id, full-text search, aggregations).
### Deliverable
Propose an end-to-end architecture including:
- Edge collection/agent design
- Cross-DC transport
- Central ingestion, buffering, storage
- Encryption/key management approach
- Failure handling, backpressure, and observability
Quick Answer: This question evaluates competence in distributed systems design, secure transport and storage, encryption and key management, scalability, and observability for log/event collection across multiple edge data centers.