How do I approach ML System Design interview questions?

ML System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master ml system design interviews.

What difficulty level is this interview question?

This is a medium difficulty ML System Design question, commonly asked during Technical Screen rounds at Datadog.

What role is this question designed for?

This question is commonly asked for Software Engineer candidates at Datadog during technical interviews.

Design an LLM Agent System That Automatically Resolves Jira Tickets and Opens Pull Requests

Q: Design an LLM Agent System That Automatically Resolves Jira Tickets and Opens Pull Requests

This ML system design question evaluates the ability to architect an autonomous LLM-agent pipeline spanning retrieval-augmented generation, tool integration, and sandboxed code execution. It tests conceptual understanding of how to ground agent reasoning in real codebases, structure permissioned tool access, and safely isolate execution environments — core competencies for senior roles building AI-powered developer tooling.

Design an LLM Agent System That Automatically Resolves Jira Tickets and Opens Pull Requests

You are asked to design an autonomous LLM-agent system that ingests an engineering Jira ticket (bug report or small feature request), understands the relevant codebase, implements a fix, and opens a pull request (PR) for human review. The agent operates over a real production code repository, so correctness, safety, and the ability to ground its reasoning in the actual code are paramount.

The interview emphasizes three subsystems in depth:

Retrieval-Augmented Generation (RAG) over the codebase and supporting context (the ticket, related tickets, docs, prior PRs), so the agent's edits are grounded in the real repository rather than hallucinated.
Tool access via a Model Context Protocol (MCP)–style integration layer — the structured, permissioned interface through which the agent reads files, runs searches, calls the Jira API, runs tests, and creates the PR.
A sandboxed execution environment in which the agent can edit code, run builds/tests, and iterate, without endangering production systems or leaking secrets.

Design the end-to-end system, then go deep on these three subsystems: how you build and query the RAG index, how you structure and secure the MCP tool layer, and how you isolate and control the sandbox.

Constraints & Assumptions

Repository scale: a mid-to-large monorepo, on the order of $10^4$ – $10^6$ files and tens of millions of lines, multiple languages. Full source cannot fit in a model context window.
Ticket volume: assume a few hundred eligible tickets per day; latency target is minutes-to-tens-of-minutes per ticket (asynchronous, not interactive), not sub-second.
Eligible tickets: scoped bugs and small features where a fix is plausibly a localized diff (a handful of files). Large refactors / architectural changes are routed to humans.
Human-in-the-loop: the agent never merges. It opens a PR; a human reviews and merges. The agent may push follow-up commits in response to review or CI feedback.
Safety: the agent must never touch production infrastructure, must never exfiltrate secrets, and all code execution happens in an isolated sandbox. Build/test must pass before a PR is opened.
Models: assume access to a strong general-purpose LLM with tool-calling, plus a smaller/cheaper model and an embedding model. Token budgets and per-ticket cost matter.

Clarifying Questions to Ask

Scope of autonomy: Should the agent only open PRs for a triaged subset of tickets (e.g., labeled agent-eligible , low-risk), or attempt everything and self-abstain? Who owns the abstain/escalation decision?
Definition of success: Is the target metric "PR opened," "PR that passes CI," "PR merged by a human with minimal edits," or "ticket actually resolved in production"? This changes evaluation and gating.
Repository access model: Do we get a full clone per ticket, a persistent indexed mirror, or read-only API access? How fresh must the index be relative to main ?
Languages and build systems: One language/build or many? Are there reliable test suites and a deterministic build we can run in the sandbox?
Tool surface and permissions: Which external systems may the agent call (Jira read/write, GitHub/GitLab, CI, internal services), and what is explicitly forbidden (deploys, prod DBs, secret stores)?
Secrets and data sensitivity: Does the repo contain secrets or regulated data? Can code/snippets be sent to the model provider, or must we use a self-hosted / VPC model?

Part 1 — End-to-End Architecture and the Agentic Loop

Lay out the full system from "a Jira ticket arrives" to "a PR is open and linked back on the ticket." Define the major components and how a single ticket flows through them, including where the agent decides to abstain/escalate.

Clarifying Questions for this Part

What is the eligibility gate — labels, a classifier, heuristics on ticket size/risk — and is it a hard filter or a soft prior the agent can override?
What are the termination conditions: max loop iterations, wall-clock/cost budget, tests-green, or low-confidence abstain?
How is per-run state persisted so a long run can resume and so we have an audit trail of every action the agent took?

What This Part Should Cover Premium

Part 2 — RAG Over the Codebase

Design the retrieval layer that grounds the agent in the actual repository. Cover what you index, how you chunk and embed it, how you keep it fresh against a moving main, and how you retrieve and assemble context for a given ticket so the agent edits the right files with the right symbols.

Clarifying Questions for this Part

How fresh must retrieval be — is per-ticket indexing at the base commit acceptable, or do we need a continuously updated shared index?
Are stack traces, failing test names, or error logs reliably present on tickets? Those are the strongest retrieval anchors.
What is the token budget for assembled context, and what is the precision target (edit the wrong file → wasted/incorrect PR)?

What This Part Should Cover Premium

Part 3 — MCP Tool Layer (the Agent's Action Interface)

Design the structured tool interface (an MCP-style server/protocol) through which the agent takes all of its actions: reading/searching code, editing files, running builds/tests, calling Jira, and opening the PR. Specify the tool catalog, the schemas, and — critically — the permissioning, sandboxing, and observability of every tool call.

Clarifying Questions for this Part

Which write-capable tools require human approval vs. run autonomously (e.g., open_pr and comment_ticket are external side effects)?
How are credentials injected — does the MCP server hold tokens and the model only sees opaque tool calls, never the secrets?
What are the rate/cost ceilings per tool to bound runaway loops (e.g., max run_tests invocations per run)?

What This Part Should Cover Premium

Part 4 — Sandboxed Execution and Verification

Design the isolated environment where the agent applies edits, builds, and runs tests, and define how test/build results gate whether a PR is opened. Cover isolation, reproducibility, secret handling, resource limits, and the loop that turns red tests into revised edits.

Clarifying Questions for this Part

Is there a deterministic, reproducible build/test setup we can containerize, and how long do full test runs take (affects budget and whether we run targeted subsets)?
What network egress, if any, does the build legitimately need (package mirrors), and how do we allowlist it without enabling exfiltration?
What is the maximum iteration/cost budget before the run abstains and escalates to a human?

What This Part Should Cover Premium

What a Strong Answer Covers Premium

Design an LLM Agent System That Automatically Resolves Jira Tickets and Opens Pull Requests

The interview emphasizes three subsystems in depth:

Retrieval-Augmented Generation (RAG) over the codebase and supporting context (the ticket, related tickets, docs, prior PRs), so the agent's edits are grounded in the real repository rather than hallucinated.
Tool access via a Model Context Protocol (MCP)–style integration layer — the structured, permissioned interface through which the agent reads files, runs searches, calls the Jira API, runs tests, and creates the PR.
A sandboxed execution environment in which the agent can edit code, run builds/tests, and iterate, without endangering production systems or leaking secrets.

Constraints & Assumptions

Repository scale: a mid-to-large monorepo, on the order of $10^4$ – $10^6$ files and tens of millions of lines, multiple languages. Full source cannot fit in a model context window.
Ticket volume: assume a few hundred eligible tickets per day; latency target is minutes-to-tens-of-minutes per ticket (asynchronous, not interactive), not sub-second.
Eligible tickets: scoped bugs and small features where a fix is plausibly a localized diff (a handful of files). Large refactors / architectural changes are routed to humans.
Human-in-the-loop: the agent never merges. It opens a PR; a human reviews and merges. The agent may push follow-up commits in response to review or CI feedback.
Safety: the agent must never touch production infrastructure, must never exfiltrate secrets, and all code execution happens in an isolated sandbox. Build/test must pass before a PR is opened.
Models: assume access to a strong general-purpose LLM with tool-calling, plus a smaller/cheaper model and an embedding model. Token budgets and per-ticket cost matter.

Clarifying Questions to Ask

Scope of autonomy: Should the agent only open PRs for a triaged subset of tickets (e.g., labeled agent-eligible , low-risk), or attempt everything and self-abstain? Who owns the abstain/escalation decision?
Definition of success: Is the target metric "PR opened," "PR that passes CI," "PR merged by a human with minimal edits," or "ticket actually resolved in production"? This changes evaluation and gating.
Repository access model: Do we get a full clone per ticket, a persistent indexed mirror, or read-only API access? How fresh must the index be relative to main ?
Languages and build systems: One language/build or many? Are there reliable test suites and a deterministic build we can run in the sandbox?
Tool surface and permissions: Which external systems may the agent call (Jira read/write, GitHub/GitLab, CI, internal services), and what is explicitly forbidden (deploys, prod DBs, secret stores)?
Secrets and data sensitivity: Does the repo contain secrets or regulated data? Can code/snippets be sent to the model provider, or must we use a self-hosted / VPC model?

Part 1 — End-to-End Architecture and the Agentic Loop

Clarifying Questions for this Part

What is the eligibility gate — labels, a classifier, heuristics on ticket size/risk — and is it a hard filter or a soft prior the agent can override?
What are the termination conditions: max loop iterations, wall-clock/cost budget, tests-green, or low-confidence abstain?
How is per-run state persisted so a long run can resume and so we have an audit trail of every action the agent took?

What This Part Should Cover Premium

Part 2 — RAG Over the Codebase

Clarifying Questions for this Part

How fresh must retrieval be — is per-ticket indexing at the base commit acceptable, or do we need a continuously updated shared index?
Are stack traces, failing test names, or error logs reliably present on tickets? Those are the strongest retrieval anchors.
What is the token budget for assembled context, and what is the precision target (edit the wrong file → wasted/incorrect PR)?

What This Part Should Cover Premium

Part 3 — MCP Tool Layer (the Agent's Action Interface)

Clarifying Questions for this Part

Which write-capable tools require human approval vs. run autonomously (e.g., open_pr and comment_ticket are external side effects)?
How are credentials injected — does the MCP server hold tokens and the model only sees opaque tool calls, never the secrets?
What are the rate/cost ceilings per tool to bound runaway loops (e.g., max run_tests invocations per run)?

What This Part Should Cover Premium

Part 4 — Sandboxed Execution and Verification

Clarifying Questions for this Part

Is there a deterministic, reproducible build/test setup we can containerize, and how long do full test runs take (affects budget and whether we run targeted subsets)?
What network egress, if any, does the build legitimately need (package mirrors), and how do we allowlist it without enabling exfiltration?
What is the maximum iteration/cost budget before the run abstains and escalates to a human?

Design an LLM Agent System That Automatically Resolves Jira Tickets and Opens Pull Requests

Quick Overview

Design an LLM Agent System That Automatically Resolves Jira Tickets and Opens Pull Requests

Constraints & Assumptions

Clarifying Questions to Ask

Part 1 — End-to-End Architecture and the Agentic Loop

Clarifying Questions for this Part

What This Part Should Cover Premium

Part 2 — RAG Over the Codebase

Clarifying Questions for this Part

What This Part Should Cover Premium

Part 3 — MCP Tool Layer (the Agent's Action Interface)

Clarifying Questions for this Part

What This Part Should Cover Premium

Part 4 — Sandboxed Execution and Verification

Clarifying Questions for this Part

What This Part Should Cover Premium

What a Strong Answer Covers Premium

Solution

Submit Your Answer to Earn 20XP

Design an LLM Agent System That Automatically Resolves Jira Tickets and Opens Pull Requests

Quick Overview

Design an LLM Agent System That Automatically Resolves Jira Tickets and Opens Pull Requests

Constraints & Assumptions

Clarifying Questions to Ask

Part 1 — End-to-End Architecture and the Agentic Loop

Clarifying Questions for this Part

What This Part Should Cover Premium

Part 2 — RAG Over the Codebase

Clarifying Questions for this Part

What This Part Should Cover Premium

Part 3 — MCP Tool Layer (the Agent's Action Interface)

Clarifying Questions for this Part

What This Part Should Cover Premium

Part 4 — Sandboxed Execution and Verification

Clarifying Questions for this Part

What This Part Should Cover Premium

What a Strong Answer Covers Premium

Solution

Submit Your Answer to Earn 20XP