Diagnose High Latency End-to-End (Browser → Database)
You are asked to analyze and reduce high request latency observed by users of a web application. Provide a structured, end-to-end approach that covers both measurement and remediation.
1) Define Key Latency Metrics
-
Client-side (RUM/web vitals):
-
DNS lookup time, TCP connect, TLS handshake
-
TTFB (Time To First Byte)
-
FCP (First Contentful Paint), LCP (Largest Contentful Paint)
-
CLS (Cumulative Layout Shift), INP (Interaction to Next Paint) if relevant
-
Navigation timing total, resource timing waterfall
-
Percentiles: p50, p90, p95, p99
-
Server-side:
-
Request latency (end-to-end), upstream call latencies
-
Queue wait time, application processing time
-
DB query times (avg, p95, slow queries), lock/contension
-
Cache hit ratio, eviction rate, serialization time
-
System: CPU, memory, GC, I/O, network RTT
2) Instrumentation and Tracing Plan
-
Client (RUM): Use the Performance API (Navigation/Resource/Long Tasks) to capture DNS/TCP/TLS/TTFB/FCP/LCP/INP per page and endpoint; report via Beacon.
-
Synthetic: Schedule tests from multiple geos/ISPs to isolate network vs app.
-
Server/APM: Export request span timings, thread/connection pool metrics, upstream calls, error/timeout rates.
-
Distributed tracing: Use W3C Trace Context (traceparent/tracestate), propagate across CDN → edge → gateway → services → DB/cache. Add span attributes (endpoint, user-agent, region, cache status).
-
Logging: Correlate logs with trace_id/span_id. Sample intelligently (tail sampling for high latency).
-
CDN/Edge: Collect cache hit ratio, origin RTT, TTFB by POP, TLS resume rate.
-
Database: Enable slow query log, collect per-statement stats, lock/row contention, pool usage.
3) Likely Bottlenecks to Check
-
CDN/Edge: Cache misses, short TTLs, large objects, no compression/Brotli, cold POPs.
-
Network: High RTT (mobile), packet loss, TLS 1.2 handshake cost, lack of connection reuse/HTTP/2/3.
-
API Gateway: Auth/JWT decode cost, rate limiting, request/response transformations, cold starts (serverless), logging overhead.
-
Application: Thread pool saturation, blocking I/O, N+1 queries, synchronous external calls, heavy serialization, GC pauses.
-
Cache: Low hit ratio, stampedes, oversized values, hot-key contention, network latency to cache tier.
-
Database: Missing/ineffective indexes, full scans, locks, slow joins, high connection churn, inefficient ORM patterns, large payloads.
4) Concrete Optimizations
-
Frontend:
-
Resource delivery: HTTP/2 or HTTP/3, Brotli, long-lived Cache-Control + immutable, ETag, service worker for offline/cache.
-
Reduce bytes/requests: Code-splitting, tree-shaking, minify, image optimization (AVIF/WebP, responsive), font-display: swap, inline critical CSS, defer/async non-critical JS, remove redirects, trim cookies.
-
Connection optimizations: preconnect/dns-prefetch/preload for critical origins; avoid multiple origins if possible; enable TLS session resumption/0-RTT with care.
-
Perceived performance: prioritize LCP resource, server-side render or edge render critical HTML, hydrate progressively, avoid main-thread long tasks.
-
Backend:
-
Connection/Threading: Right-size HTTP/thread pools; use async I/O for high concurrency; enable keep-alive and connection pooling to DB/cache/services.
-
Caching: Introduce/expand CDN and reverse-proxy caching; application-level caches with TTL + jitter; stale-while-revalidate; request coalescing to avoid stampedes.
-
Database: Create/tune indexes; EXPLAIN and rewrite queries; avoid SELECT *; paginate with keyset where possible; batch queries; tune pool size; consider read replicas/partitioning.
-
Services: Timeouts, retries with jitter, circuit breakers, bulkheads; reduce fan-out; parallelize independent calls; compress payloads; use gRPC/HTTP/2 where suitable.
-
Tail latency: Focus on p95/p99; mitigate with hedged requests (idempotent), load shedding, backpressure on queues.
Provide a step-by-step plan, example metrics, and how you’d validate improvements.