This question evaluates a candidate's competency in service reliability and observability, specifically understanding and distinguishing SLIs, SLOs, and SLAs, planning error budgets, and designing monitoring and alerting for a production web API.
Context: You are designing reliability goals and on-call policies for a production web API that serves JSON over HTTPS. Requests include a mix of GET/POST endpoints. You need to define what you measure (SLIs), targets (SLOs), the contractual promise (SLA), plan an error budget for a quarterly SLO, and design monitoring/alerting that minimizes alert fatigue.
Login required