Design NL-to-SQL performance optimization assistant

Q: Design NL-to-SQL performance optimization assistant

This is a System Design interview question from Startups.Com for Machine Learning Engineer roles. View the full question and solution on PracHub.

Q: How do I approach System Design interview questions?

System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master system design interviews.

Question

Loading...

Scenario

You are building an internal tool for engineers/analysts to ask natural-language questions about database performance (e.g., “Why is the orders dashboard slow today?” or “Which queries are causing the most CPU?”).

The system must:

Interpret the natural-language question.
Determine which diagnostic data to query (e.g., slow query logs, query stats, table/index metadata).
Generate and run the relevant SQL (and/or EXPLAIN plans) safely.
Analyze the results to identify root causes.
Produce actionable optimization recommendations (e.g., index suggestions, query rewrites, schema changes, config tuning), with evidence.

Assume the database is a relational OLTP system (e.g., PostgreSQL/MySQL) with access to:

Query statistics (e.g., pg_stat_statements -like), slow logs
System metrics (CPU/IO), table/index metadata
Ability to run read-only SQL and optionally EXPLAIN (ANALYZE) in controlled environments

Requirements

Functional

NL question → structured intent (what to investigate, which time window, which service/table/query)
Generate safe SQL diagnostics queries
Execute queries with guardrails (timeouts, read-only, row limits)
Analyze results and produce ranked hypotheses
Output recommendations with confidence and supporting data
Ask clarifying questions when the NL query is ambiguous

Non-functional

Security: least privilege, tenant isolation, PII protection
Reliability: handle partial data sources; graceful degradation
Latency: interactive (e.g., p95 < 5–10s for common questions)
Auditability: log prompts, SQL, results summaries, and decisions
Cost controls if using LLMs (caching, batching, smaller models)

Deliverables

Describe the end-to-end system design: architecture, main components, data flow, safety mechanisms, and how you would evaluate quality.