Triage and Validation Plan: US SMS Delivery-Rate Drop
Context
On March 10 from 20:00–22:00 ET, the US SMS delivery rate (Delivered/Attempted) dropped by 6.5 percentage points, with larger impacts on two major US carriers. Canada and the UK are unaffected. On March 9, a new link‑shortener domain was rolled out and 10DLC registrations were updated.
Build a triage and validation plan that:
-
Defines delivery, acceptance, and failure metrics using carrier error codes; and specifies guardrails (CTR, unsubscribe, complaint rate).
-
Lists the precise analysis slices: by carrier, sender type (short code/toll‑free/10DLC), campaign/template, message length/character set, URL domain, vertical (SHAFT‑sensitive), send‑time, client, region, and new‑vs‑existing senders.
-
Describes artifact checks: duplicate sends, clock skew/timezone, logging gaps, retry policy changes, throughput throttles, MPS caps, carrier queue backoffs.
-
Prioritizes hypotheses and how to test each: (a) carrier filtering—use error‑code mix shift and acceptance vs delivery delta; (b) link‑domain reputation—A/B old vs new domain stratified by carrier and sender; (c) 10DLC registration/brand‑score issues—audit TCR status and vetting outcomes; (d) content/template changes—token‑level diff and spam‑score models.
-
Specifies the experiment design: randomization unit, sample sizing/MDE for delivery‑rate uplift, sequential monitoring, holdouts; and the statistical model for attribution (hierarchical logistic regression controlling for carrier×sender×template×hour).
-
Provides an action and rollback plan, escalation criteria to carriers, and evidence sufficient to declare the incident resolved.