How do I approach System Design interview questions?

System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master system design interviews.

What difficulty level is this interview question?

This is a hard difficulty System Design question, commonly asked during Technical Screen rounds at Snowflake.

What role is this question designed for?

This question is commonly asked for Software Engineer candidates at Snowflake during technical interviews.

Design an Automated Jira-Ticket-to-PR System | Snowflake Interview Question

Q: Design an Automated Jira-Ticket-to-PR System

This system design question evaluates expertise in distributed systems, asynchronous job processing, and queue-based architecture. It tests practical understanding of durable buffering, state machine design, idempotency, and horizontal scalability — core competencies assessed in senior software engineer interviews.

Design an Automated Jira-Ticket-to-PR System

You are building an internal developer-productivity tool. Your company already runs a CI pipeline that produces failing tests; every test failure automatically files a Jira bug ticket. The volume is continuous — new bug tickets keep arriving throughout the day.

You are given an AI agent as a black box: hand it a single Jira ticket and, roughly 30+ minutes later, it returns a pull request that attempts to fix the bug described by that ticket. The agent is slow, can fail, and has limited concurrency.

Design the end-to-end system that continuously pulls newly created Jira bug tickets, feeds them to the AI agent, tracks the state of every ticket through its lifecycle, and surfaces the generated PRs back to engineers for review. The interviewer specifically wants to see the queueing/buffering design, the data model that records each ticket's current state, and how the system shards and scales as ticket volume grows.

Constraints & Assumptions

Ingest rate: continuous; assume on the order of hundreds to low-thousands of new bug tickets per day (peaky — a bad deploy can file hundreds in minutes).
AI agent: ~30+ minutes per ticket, bounded concurrency (e.g. a fixed pool of agent slots), non-zero failure rate, no strong latency SLA on any single ticket.
Correctness over speed: it is fine for a ticket to wait in a backlog; it is not fine to lose a ticket, double-process it, or open duplicate PRs.
Sources/sinks: Jira REST API for reading tickets and writing status comments; a Git host (GitHub/GitLab) where the agent opens PRs.
Internal tool: modest number of human reviewers; no public traffic; auth is SSO/internal.

Clarifying Questions to Ask

How do we discover new tickets — does Jira push webhooks, or do we poll the Jira API on an interval? Are webhooks reliable enough to be the only source?
What is the exact AI-agent interface — synchronous call that blocks 30 min, or submit-job + poll/callback? What is its max concurrency, and does it expose a job id we can reconcile against?
What is the terminal state of a ticket from our system's view — PR opened? PR merged? Or do we stop at "PR opened and assigned to a reviewer"?
What should happen on agent failure — retry automatically, give up after N attempts, or escalate to a human queue? Is there a class of tickets we should never auto-attempt?
What are the dedup rules — if the same bug is filed twice, or a ticket is reopened, should we generate a new PR or reuse the old one?
Do we need observability/SLOs (backlog depth, success rate, time-to-PR), and is there an audit requirement for every action taken on a ticket?

What a Strong Answer Covers Premium

Follow-up Questions

A bad deploy files 500 tickets in 5 minutes , but you only have 10 agent slots. Walk through what the system does over the next few hours, and how you keep the backlog bounded and observable.
A worker pulls a ticket, the agent runs for 25 minutes, then the worker pod is killed by a rolling deploy. Trace exactly how the ticket gets reprocessed without opening two PRs.
Engineers complain the AI opens PRs for flaky tests that fix themselves. How would you add a gate (dedup, flakiness detection, or confidence threshold) without redesigning the pipeline?
The AI agent's concurrency just doubled. Which component(s) do you scale, and what is the next bottleneck (DB writes? Jira API rate limits? reviewer throughput?)?

Design an Automated Jira-Ticket-to-PR System

Constraints & Assumptions

Ingest rate: continuous; assume on the order of hundreds to low-thousands of new bug tickets per day (peaky — a bad deploy can file hundreds in minutes).
AI agent: ~30+ minutes per ticket, bounded concurrency (e.g. a fixed pool of agent slots), non-zero failure rate, no strong latency SLA on any single ticket.
Correctness over speed: it is fine for a ticket to wait in a backlog; it is not fine to lose a ticket, double-process it, or open duplicate PRs.
Sources/sinks: Jira REST API for reading tickets and writing status comments; a Git host (GitHub/GitLab) where the agent opens PRs.
Internal tool: modest number of human reviewers; no public traffic; auth is SSO/internal.

Clarifying Questions to Ask

How do we discover new tickets — does Jira push webhooks, or do we poll the Jira API on an interval? Are webhooks reliable enough to be the only source?
What is the exact AI-agent interface — synchronous call that blocks 30 min, or submit-job + poll/callback? What is its max concurrency, and does it expose a job id we can reconcile against?
What is the terminal state of a ticket from our system's view — PR opened? PR merged? Or do we stop at "PR opened and assigned to a reviewer"?
What should happen on agent failure — retry automatically, give up after N attempts, or escalate to a human queue? Is there a class of tickets we should never auto-attempt?
What are the dedup rules — if the same bug is filed twice, or a ticket is reopened, should we generate a new PR or reuse the old one?
Do we need observability/SLOs (backlog depth, success rate, time-to-PR), and is there an audit requirement for every action taken on a ticket?

What a Strong Answer Covers Premium

Follow-up Questions

A bad deploy files 500 tickets in 5 minutes , but you only have 10 agent slots. Walk through what the system does over the next few hours, and how you keep the backlog bounded and observable.
A worker pulls a ticket, the agent runs for 25 minutes, then the worker pod is killed by a rolling deploy. Trace exactly how the ticket gets reprocessed without opening two PRs.
Engineers complain the AI opens PRs for flaky tests that fix themselves. How would you add a gate (dedup, flakiness detection, or confidence threshold) without redesigning the pipeline?
The AI agent's concurrency just doubled. Which component(s) do you scale, and what is the next bottleneck (DB writes? Jira API rate limits? reviewer throughput?)?

Design an Automated Jira-Ticket-to-PR System

Quick Overview

Design an Automated Jira-Ticket-to-PR System

Constraints & Assumptions

Clarifying Questions to Ask

What a Strong Answer Covers Premium

Follow-up Questions

Solution

Submit Your Answer to Earn 20XP

Design an Automated Jira-Ticket-to-PR System

Quick Overview

Design an Automated Jira-Ticket-to-PR System

Constraints & Assumptions

Clarifying Questions to Ask

What a Strong Answer Covers Premium

Follow-up Questions

Solution

Submit Your Answer to Earn 20XP