Explain your tech stack choices
Company: Stripe
Role: Software Engineer
Category: Behavioral & Leadership
Difficulty: easy
Interview Round: HR Screen
What is your primary tech stack (languages, frameworks, services, data stores, and infrastructure)? Why did you choose these technologies, and at what scale and performance requirements have you used them?
Quick Answer: This question evaluates a candidate's ability to articulate technology selection, operational experience, and trade-off reasoning across languages, frameworks, services, data stores, and infrastructure.
Solution
How to answer (60–90 seconds)
- Headline your stack: Name the core languages, frameworks, data stores, and infra in one sentence.
- Give 2–3 reasons for your choices: performance, reliability, team expertise, ecosystem maturity, cost.
- Quantify scale: RPS/throughput, latency percentiles, data sizes, concurrency, availability/SLOs, regions.
- Close with trade-offs and what you would change next.
Answer template
- Stack summary: "Primary stack: [Languages] for backend, [Frontend stack], [Messaging], [Data stores], on [Cloud/K8s], IaC with [tool], CI/CD with [tool], observability via [tools]."
- Rationale: "We chose [X] for [reason], [Y] for [reason], and [Z] for [reason]."
- Scale: "Handled ~[RPS] peak, p95 ~[ms] for reads and ~[ms] for writes; ~[events/day] through [queue/stream]; DB ~[TB]/~[TPS]; cache ~[ops/s] with ~[hit rate]; uptime ~[SLO]."
- Trade-offs: "Trade-offs were [A vs B]. If I were to change something, I'd [next step]."
Example answer (high-scale payments-style environment)
- Stack summary: Primary stack: Go for backend services (gRPC/HTTP), TypeScript/React on the frontend, Kafka for async messaging, Redis for caching and rate limiting, Postgres for OLTP with read replicas, S3/Parquet + a warehouse for analytics. Deployed on Kubernetes in a major cloud provider using Terraform for IaC, GitHub Actions + ArgoCD for CI/CD, and Datadog + OpenTelemetry for observability.
- Rationale: Go gives us simple, memory-safe concurrency and fast startup; TypeScript improves frontend correctness at scale; Postgres provides strong consistency and rich SQL for transactional workloads; Kafka provides durable, replayable streams for decoupling and backpressure; Redis handles low-latency reads and idempotency tokens; Kubernetes standardizes deployments and autoscaling across services.
- Scale: We served ~12–18k peak RPS across the fleet; p95 latencies ~90–130 ms for read-heavy APIs and ~220–350 ms for write paths with external calls. Kafka handled ~250M messages/day (~3 GB/s peak) across 6–8 brokers. Postgres primary ~1.2 TB with ~7–9k TPS and logical replication to read replicas; Redis ~80–120k ops/s with ~95% hit rate. Data lake ~10+ TB in S3, ~2–3 TB/day ingestion. Availability SLO 99.95%; multi-AZ, multi-region failover tested quarterly; blue/green deploys with automated canaries.
- Trade-offs: We traded some developer velocity for stricter schema/versioning on event streams and consistent API contracts. Next, I would consolidate services with overlapping domains and move a few CPU-heavy workers to Rust for better perf/cost.
If your experience is earlier-stage or smaller scale
- Be concrete with what you do know: "~300 RPS peak, p95 ~180 ms, Postgres ~150 GB, Redis ~20k ops/s, single-region with multi-AZ."
- Add how you designed for growth: connection pooling, backpressure, idempotency keys, dead-letter queues, pagination, circuit breakers, rollbacks.
- State what you'd adjust for higher scale: sharding/partitioning, read replicas, async write-behind, schema evolution, regionalization.
Metrics cheat sheet to mention
- Throughput: requests/sec, messages/day, MB/s or GB/s.
- Latency: p50/p95/p99 for key endpoints.
- Data: DB size (GB/TB), TPS, cache hit rate, warehouse/lake ingest per day.
- Reliability: SLO/uptime, error budget burn, incident frequency, rollback time.
- Deployment: deploys/day, canary/blue-green usage, mean time to restore.
Common pitfalls
- Just listing technologies without rationale or numbers.
- Overstating scale without concrete metrics.
- Ignoring reliability, security, or cost trade-offs.
- Omitting async/messaging when relevant to resilience and throughput.
Guardrails
- If you lack exact numbers, provide ranges and how you measured them (APM dashboards, load tests).
- Tie choices to business needs (latency budgets, compliance, on-call load, team skills).