PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/System Design/Atlassian

Diagnose why a scaled system became slow

Last updated: Mar 29, 2026

Quick Overview

This question evaluates proficiency in diagnosing performance regressions in scaled production services, emphasizing observability, bottleneck identification across components (compute, memory, I/O, network, caches, databases), and incident triage skills.

  • medium
  • Atlassian
  • System Design
  • Software Engineer

Diagnose why a scaled system became slow

Company: Atlassian

Role: Software Engineer

Category: System Design

Difficulty: medium

Interview Round: Technical Screen

You are on-call for a production service that recently scaled up (more instances, more users/traffic). After the scale-up, users report the system is “much slower” (higher latency, timeouts), even though the service is still functional. Design a practical, step-by-step troubleshooting approach to identify the bottleneck(s) and stabilize the system. Cover at least: - What metrics and dashboards you would check first (client, load balancer, service, downstream dependencies). - How you would isolate whether the issue is CPU, memory/GC, disk I/O, network, database, cache, or a specific dependency. - How you would use logs, tracing, and profiling to narrow it down. - Immediate mitigations vs. longer-term fixes. - Common “scaled-up system got slower” root causes (e.g., thundering herd, connection pool saturation, cache miss storms, lock contention, hot partitions).

Quick Answer: This question evaluates proficiency in diagnosing performance regressions in scaled production services, emphasizing observability, bottleneck identification across components (compute, memory, I/O, network, caches, databases), and incident triage skills.

Related Interview Questions

  • Design a simple greeting-card web app - Atlassian (medium)
  • Design a distributed rate limiter service - Atlassian (medium)
  • Design a Data Stream Processor - Atlassian (easy)
  • Design a scalable chatbot platform - Atlassian (medium)
  • Design an image crawler for unlimited URLs - Atlassian (medium)
Atlassian logo
Atlassian
Jan 22, 2026, 12:00 AM
Software Engineer
Technical Screen
System Design
7
0
Loading...

You are on-call for a production service that recently scaled up (more instances, more users/traffic). After the scale-up, users report the system is “much slower” (higher latency, timeouts), even though the service is still functional.

Design a practical, step-by-step troubleshooting approach to identify the bottleneck(s) and stabilize the system.

Cover at least:

  • What metrics and dashboards you would check first (client, load balancer, service, downstream dependencies).
  • How you would isolate whether the issue is CPU, memory/GC, disk I/O, network, database, cache, or a specific dependency.
  • How you would use logs, tracing, and profiling to narrow it down.
  • Immediate mitigations vs. longer-term fixes.
  • Common “scaled-up system got slower” root causes (e.g., thundering herd, connection pool saturation, cache miss storms, lock contention, hot partitions).

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More System Design•More Atlassian•More Software Engineer•Atlassian Software Engineer•Atlassian System Design•Software Engineer System Design
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.