PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/System Design/Snowflake

Design an Automated Jira-Ticket-to-PR System

Last updated: Jun 24, 2026

Quick Overview

This system design question evaluates expertise in distributed systems, asynchronous job processing, and queue-based architecture. It tests practical understanding of durable buffering, state machine design, idempotency, and horizontal scalability — core competencies assessed in senior software engineer interviews.

  • hard
  • Snowflake
  • System Design
  • Software Engineer

Design an Automated Jira-Ticket-to-PR System

Company: Snowflake

Role: Software Engineer

Category: System Design

Difficulty: hard

Interview Round: Technical Screen

## Design an Automated Jira-Ticket-to-PR System You are building an internal developer-productivity tool. Your company already runs a CI pipeline that produces failing tests; every test failure automatically files a Jira bug ticket. The volume is continuous — new bug tickets keep arriving throughout the day. You are given an **AI agent** as a black box: hand it a single Jira ticket and, roughly **30+ minutes later**, it returns a pull request that attempts to fix the bug described by that ticket. The agent is slow, can fail, and has limited concurrency. Design the **end-to-end system** that continuously pulls newly created Jira bug tickets, feeds them to the AI agent, tracks the state of every ticket through its lifecycle, and surfaces the generated PRs back to engineers for review. The interviewer specifically wants to see the queueing/buffering design, the data model that records each ticket's current state, and how the system shards and scales as ticket volume grows. ```hint Where to start This is a long-running asynchronous job system, not a request/response service. Separate **ingestion** (discover new Jira tickets) from **processing** (run the 30-minute AI agent) with a durable buffer in between so a burst of tickets never overwhelms the limited agent capacity. ``` ```hint The hard part is the 30-minute job A worker that holds a ticket for 30+ minutes will crash, redeploy, or time out. Design for **at-least-once** delivery: persist a per-ticket state machine in a DB, use a visibility-timeout/lease on the queue, and make the agent invocation **idempotent** (don't open two PRs for the same ticket). ``` ```hint Sharding / scaling axis Throughput is bounded by AI-agent concurrency, not CPU. Scale workers horizontally and shard the work + state by `ticket_id` (or by Jira project) so hot projects don't starve others, and so the state table partitions cleanly. ``` ### Constraints & Assumptions - **Ingest rate:** continuous; assume on the order of hundreds to low-thousands of new bug tickets per day (peaky — a bad deploy can file hundreds in minutes). - **AI agent:** ~30+ minutes per ticket, bounded concurrency (e.g. a fixed pool of agent slots), non-zero failure rate, no strong latency SLA on any single ticket. - **Correctness over speed:** it is fine for a ticket to wait in a backlog; it is not fine to lose a ticket, double-process it, or open duplicate PRs. - **Sources/sinks:** Jira REST API for reading tickets and writing status comments; a Git host (GitHub/GitLab) where the agent opens PRs. - **Internal tool:** modest number of human reviewers; no public traffic; auth is SSO/internal. ### Clarifying Questions to Ask - How do we **discover** new tickets — does Jira push webhooks, or do we poll the Jira API on an interval? Are webhooks reliable enough to be the only source? - What is the exact **AI-agent interface** — synchronous call that blocks 30 min, or submit-job + poll/callback? What is its max concurrency, and does it expose a job id we can reconcile against? - What is the **terminal state** of a ticket from our system's view — PR opened? PR merged? Or do we stop at "PR opened and assigned to a reviewer"? - What should happen on **agent failure** — retry automatically, give up after N attempts, or escalate to a human queue? Is there a class of tickets we should never auto-attempt? - What are the **dedup rules** — if the same bug is filed twice, or a ticket is reopened, should we generate a new PR or reuse the old one? - Do we need **observability/SLOs** (backlog depth, success rate, time-to-PR), and is there an audit requirement for every action taken on a ticket? ### What a Strong Answer Covers ```premium-lock What a Strong Answer Covers ``` ### Follow-up Questions - A bad deploy files **500 tickets in 5 minutes**, but you only have 10 agent slots. Walk through what the system does over the next few hours, and how you keep the backlog bounded and observable. - A worker pulls a ticket, the agent runs for 25 minutes, then the worker pod is killed by a rolling deploy. Trace exactly how the ticket gets reprocessed **without** opening two PRs. - Engineers complain the AI opens PRs for **flaky tests** that fix themselves. How would you add a gate (dedup, flakiness detection, or confidence threshold) without redesigning the pipeline? - The AI agent's concurrency just doubled. Which component(s) do you scale, and what is the *next* bottleneck (DB writes? Jira API rate limits? reviewer throughput?)?

Quick Answer: This system design question evaluates expertise in distributed systems, asynchronous job processing, and queue-based architecture. It tests practical understanding of durable buffering, state machine design, idempotency, and horizontal scalability — core competencies assessed in senior software engineer interviews.

Related Interview Questions

  • Design a Cron Job Scheduler - Snowflake (medium)
  • Design a REST API Abstraction Layer - Snowflake (hard)
  • Design a disk-backed KV store under contention - Snowflake (easy)
  • Design an ACL authorization checking service - Snowflake (hard)
  • Design an object store with deduplication - Snowflake (medium)
Snowflake logo
Snowflake
Jun 15, 2026, 12:00 AM
Software Engineer
Technical Screen
System Design
0
0

Design an Automated Jira-Ticket-to-PR System

You are building an internal developer-productivity tool. Your company already runs a CI pipeline that produces failing tests; every test failure automatically files a Jira bug ticket. The volume is continuous — new bug tickets keep arriving throughout the day.

You are given an AI agent as a black box: hand it a single Jira ticket and, roughly 30+ minutes later, it returns a pull request that attempts to fix the bug described by that ticket. The agent is slow, can fail, and has limited concurrency.

Design the end-to-end system that continuously pulls newly created Jira bug tickets, feeds them to the AI agent, tracks the state of every ticket through its lifecycle, and surfaces the generated PRs back to engineers for review. The interviewer specifically wants to see the queueing/buffering design, the data model that records each ticket's current state, and how the system shards and scales as ticket volume grows.

Constraints & Assumptions

  • Ingest rate: continuous; assume on the order of hundreds to low-thousands of new bug tickets per day (peaky — a bad deploy can file hundreds in minutes).
  • AI agent: ~30+ minutes per ticket, bounded concurrency (e.g. a fixed pool of agent slots), non-zero failure rate, no strong latency SLA on any single ticket.
  • Correctness over speed: it is fine for a ticket to wait in a backlog; it is not fine to lose a ticket, double-process it, or open duplicate PRs.
  • Sources/sinks: Jira REST API for reading tickets and writing status comments; a Git host (GitHub/GitLab) where the agent opens PRs.
  • Internal tool: modest number of human reviewers; no public traffic; auth is SSO/internal.

Clarifying Questions to Ask

  • How do we discover new tickets — does Jira push webhooks, or do we poll the Jira API on an interval? Are webhooks reliable enough to be the only source?
  • What is the exact AI-agent interface — synchronous call that blocks 30 min, or submit-job + poll/callback? What is its max concurrency, and does it expose a job id we can reconcile against?
  • What is the terminal state of a ticket from our system's view — PR opened? PR merged? Or do we stop at "PR opened and assigned to a reviewer"?
  • What should happen on agent failure — retry automatically, give up after N attempts, or escalate to a human queue? Is there a class of tickets we should never auto-attempt?
  • What are the dedup rules — if the same bug is filed twice, or a ticket is reopened, should we generate a new PR or reuse the old one?
  • Do we need observability/SLOs (backlog depth, success rate, time-to-PR), and is there an audit requirement for every action taken on a ticket?

What a Strong Answer Covers Premium

Follow-up Questions

  • A bad deploy files 500 tickets in 5 minutes , but you only have 10 agent slots. Walk through what the system does over the next few hours, and how you keep the backlog bounded and observable.
  • A worker pulls a ticket, the agent runs for 25 minutes, then the worker pod is killed by a rolling deploy. Trace exactly how the ticket gets reprocessed without opening two PRs.
  • Engineers complain the AI opens PRs for flaky tests that fix themselves. How would you add a gate (dedup, flakiness detection, or confidence threshold) without redesigning the pipeline?
  • The AI agent's concurrency just doubled. Which component(s) do you scale, and what is the next bottleneck (DB writes? Jira API rate limits? reviewer throughput?)?

Solution

Show

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More System Design•More Snowflake•More Software Engineer•Snowflake Software Engineer•Snowflake System Design•Software Engineer System Design
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.