Analyze end-to-end request latency

Q: Analyze end-to-end request latency

This is a System Design interview question from Adobe for Software Engineer roles. View the full question and solution on PracHub.

Q: How do I approach System Design interview questions?

System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master system design interviews.

Question

Diagnose High Latency End-to-End (Browser → Database)

You are asked to analyze and reduce high request latency observed by users of a web application. Provide a structured, end-to-end approach that covers both measurement and remediation.

1) Define Key Latency Metrics

Client-side (RUM/web vitals):
- DNS lookup time, TCP connect, TLS handshake
- TTFB (Time To First Byte)
- FCP (First Contentful Paint), LCP (Largest Contentful Paint)
- CLS (Cumulative Layout Shift), INP (Interaction to Next Paint) if relevant
- Navigation timing total, resource timing waterfall
- Percentiles: p50, p90, p95, p99
Server-side:
- Request latency (end-to-end), upstream call latencies
- Queue wait time, application processing time
- DB query times (avg, p95, slow queries), lock/contension
- Cache hit ratio, eviction rate, serialization time
- System: CPU, memory, GC, I/O, network RTT

2) Instrumentation and Tracing Plan

Client (RUM): Use the Performance API (Navigation/Resource/Long Tasks) to capture DNS/TCP/TLS/TTFB/FCP/LCP/INP per page and endpoint; report via Beacon.
Synthetic: Schedule tests from multiple geos/ISPs to isolate network vs app.
Server/APM: Export request span timings, thread/connection pool metrics, upstream calls, error/timeout rates.
Distributed tracing: Use W3C Trace Context (traceparent/tracestate), propagate across CDN → edge → gateway → services → DB/cache. Add span attributes (endpoint, user-agent, region, cache status).
Logging: Correlate logs with trace_id/span_id. Sample intelligently (tail sampling for high latency).
CDN/Edge: Collect cache hit ratio, origin RTT, TTFB by POP, TLS resume rate.
Database: Enable slow query log, collect per-statement stats, lock/row contention, pool usage.

3) Likely Bottlenecks to Check

CDN/Edge: Cache misses, short TTLs, large objects, no compression/Brotli, cold POPs.
Network: High RTT (mobile), packet loss, TLS 1.2 handshake cost, lack of connection reuse/HTTP/2/3.
API Gateway: Auth/JWT decode cost, rate limiting, request/response transformations, cold starts (serverless), logging overhead.
Application: Thread pool saturation, blocking I/O, N+1 queries, synchronous external calls, heavy serialization, GC pauses.
Cache: Low hit ratio, stampedes, oversized values, hot-key contention, network latency to cache tier.
Database: Missing/ineffective indexes, full scans, locks, slow joins, high connection churn, inefficient ORM patterns, large payloads.

4) Concrete Optimizations

Frontend:
- Resource delivery: HTTP/2 or HTTP/3, Brotli, long-lived Cache-Control + immutable, ETag, service worker for offline/cache.
- Reduce bytes/requests: Code-splitting, tree-shaking, minify, image optimization (AVIF/WebP, responsive), font-display: swap, inline critical CSS, defer/async non-critical JS, remove redirects, trim cookies.
- Connection optimizations: preconnect/dns-prefetch/preload for critical origins; avoid multiple origins if possible; enable TLS session resumption/0-RTT with care.
- Perceived performance: prioritize LCP resource, server-side render or edge render critical HTML, hydrate progressively, avoid main-thread long tasks.
Backend:
- Connection/Threading: Right-size HTTP/thread pools; use async I/O for high concurrency; enable keep-alive and connection pooling to DB/cache/services.
- Caching: Introduce/expand CDN and reverse-proxy caching; application-level caches with TTL + jitter; stale-while-revalidate; request coalescing to avoid stampedes.
- Database: Create/tune indexes; EXPLAIN and rewrite queries; avoid SELECT *; paginate with keyset where possible; batch queries; tune pool size; consider read replicas/partitioning.
- Services: Timeouts, retries with jitter, circuit breakers, bulkheads; reduce fan-out; parallelize independent calls; compress payloads; use gRPC/HTTP/2 where suitable.
- Tail latency: Focus on p95/p99; mitigate with hedged requests (idempotent), load shedding, backpressure on queues.

Provide a step-by-step plan, example metrics, and how you’d validate improvements.