Technical Fundamentals for Non-Technical Product Managers
Asked of: Product Manager
Last updated

What's being tested
Interviewers are probing whether you can translate technical constraints into product tradeoffs: prioritize features against reliability, estimate user impact from performance metrics, and communicate clear success criteria to engineers and stakeholders. They want a PM who understands observability, common scalability patterns, rollout safety (feature flags/canaries), and how those choices affect metrics like p95 or DAU. At DoorDash this matters because small latency or reliability regressions directly affect conversion, retention, and operations cost.
Core knowledge
-
Latency vs Throughput — Latency is per-request time (report
p50,p90,p99), throughput is requests/sec; optimizing one can worsen the other, so define which metric maps to user experience first. -
SLI / SLO / SLA — An SLI is a signal (e.g., successful checkouts ratio), an SLO is an internal target (99.9% success), and an SLA is a contractual penalty; PMs set SLOs tied to business impact.
-
Error budget — Translate SLO into an error budget (e.g., 0.1% downtime per month) to prioritize incidents vs launches; spend it consciously during aggressive rollouts.
-
Observability — Instrumentation must include metrics, logs, and traces; metrics detect issues, traces show latency sources, logs contain context for root cause analysis.
-
Caching & CDNs — Use cache layers (edge
CDNor appRedis) to reduce origin load and latency for read-heavy endpoints; be explicit about TTL, invalidation, and staleness tolerance. -
Datastore tradeoffs —
Postgres(ACID) for strong consistency and complex queries; NoSQL for high-scale, partition-tolerant needs. Declare consistency needs before choosing storage patterns. -
Retries and idempotency — Retries must use exponential backoff and require idempotent endpoints (or idempotency keys) to prevent duplicate side effects like double charges.
-
Rate limiting & throttling — Protect core systems by setting per-client and global limits; for PMs, choose user-facing behavior (reject vs queue) and fallback UX messaging.
-
Feature flags & rollout strategies — Use
feature flagtargeting, percentage rollouts, andcanarydeployments; pair with automatic rollback triggers tied to SLIs. -
Incident & postmortem discipline — Track mean time to detect (
MTTD) and mean time to recover (MTTR); PMs should own customer communication, prioritization, and follow-through on action items. -
Cost vs performance — Quantify: caching reduces compute but increases infra (mem) cost; horizontally scaling N replicas increases throughput roughly linearly until downstream bottlenecks appear.
-
Security & privacy basics — Classify data sensitivity, prefer encryption-in-transit and at-rest, and require least-privilege access; PMs must specify compliance constraints early.
Worked example
(Example problem: "Reduce p90 checkout latency by 30% for high-traffic markets")
Start by clarifying scope: define the exact metric (p90 over 7 days), segmentation (logged-in users vs guest), and acceptable customer impact during rollout. Frame the answer around three pillars: measurement, quick wins, and medium-term architecture changes.
Measurement: add instrumentation to break checkout into sub-spans (gateway, payment, inventory) so you can attribute latency.
Quick wins: enable a short cache for product availability and defer nonessential network calls (analytics) from the critical path.
Medium-term: consider asynchronous payment confirmation with optimistic UI and strengthen SLOs with an error budget for controlled experiments. Tradeoff to flag: optimistic UX reduces visible latency but increases complexity in reconciliation and potential support load. Close by proposing guardrails: a feature flag percentage rollout, automated p90 rollback threshold, and a 2-week monitoring window; if more time, you'd run A/B tests measuring conversion lift and customer support volume.
A second angle
(Example problem: "Design a safe rollout plan for a new driver-tracking feature")
Here the same concepts apply but the emphasis shifts to privacy, telemetry volume, and real-time constraints. Start with instrumentation and an SLI (e.g., successful location updates per minute) and an SLO tied to restaurant ETA accuracy. Use a canary rollout—first internal drivers, then a small percentage of production—while monitoring p99 of location-processing latency and storage costs. Because telemetry volume can balloon, include sampling or downsampling decisions upfront and define retention policies. The tradeoffs are privacy vs fidelity (more frequent updates = better ETA but higher cost and privacy surface); as PM, decide acceptable resolution and communicate it to legal/ops.
Common pitfalls
Pitfall: Confusing symptom with cause — blaming "API slowness" when the real issue is downstream DB contention; always instrument to isolate the layer before prescribing fixes.
Pitfall: Overfocusing on averages — using mean latency hides tail latency; communicate
p90/p99and user-facing percentiles tied to experience.
Pitfall: Launching without rollback criteria — shipping a change with no SLO-based rollback rule forces firefighting; define automated thresholds and ownership before rollout.
Connections
Interviewers may pivot into experimentation (how you'd A/B test a performance optimization), analytics (metric definitions and invariants), or ML product tradeoffs (latency vs model complexity for real-time recommendations). Be ready to connect SLOs to business metrics like conversion rate or retention.
Further reading
-
Site Reliability Engineering (Google) — canonical SLI/SLO best practices and incident management.
-
Feature Toggles (Martin Fowler) — practical patterns for safe rollouts and technical debt management.
Related concepts
- Technical Fundamentals for Non-Technical Product Managers
- PM Technical Fundamentals for Growth Experimentation
- Experimentation, Diagnostics, and Growth Infrastructure for Non-Technical PMs
- A/B Testing and Growth Infrastructure for Non-Technical PMs
- Technical Leadership, Impact, And Trade-OffsBehavioral & Leadership
- Diagnostics, A/B Testing, Estimation, and Growth Infrastructure Fundamentals