This Coreweave system design question asks candidates to design a website monitoring service. It prepares candidates to discuss scheduling, probes, alerting, data retention, reliability, and the operational trade-offs behind a production monitoring workflow.
Design a website monitoring service that checks customer websites, detects outages or high latency, and alerts users when their sites fail health checks.
### Constraints & Assumptions
- Customers register URLs and alert destinations.
- Checks should run periodically from multiple regions.
- The service should track uptime and latency.
- False positives should be minimized.
### Clarifying Questions to Ask
- What check intervals and latency targets are required?
- Do we need HTTP only or TCP/DNS checks too?
- How many URLs and regions?
- What alert channels are needed?
- How many consecutive failures should trigger an alert?
### What a Strong Answer Covers
```premium-lock What a Strong Answer Covers
```
### Follow-up Questions
- How would you handle checker region outages?
- How would you prevent alert storms?
- How would customers configure maintenance windows?
- How would you support SSL certificate expiry monitoring?
Quick Answer: This Coreweave system design question asks candidates to design a website monitoring service. It prepares candidates to discuss scheduling, probes, alerting, data retention, reliability, and the operational trade-offs behind a production monitoring workflow.
Design a website monitoring service that checks customer websites, detects outages or high latency, and alerts users when their sites fail health checks.
Constraints & Assumptions
Customers register URLs and alert destinations.
Checks should run periodically from multiple regions.
The service should track uptime and latency.
False positives should be minimized.
Clarifying Questions to Ask
What check intervals and latency targets are required?
Do we need HTTP only or TCP/DNS checks too?
How many URLs and regions?
What alert channels are needed?
How many consecutive failures should trigger an alert?
What a Strong Answer Covers Premium
Follow-up Questions
How would you handle checker region outages?
How would you prevent alert storms?
How would customers configure maintenance windows?
How would you support SSL certificate expiry monitoring?