##### Question Design an ML-powered assistant for a no-code table product like Airtable that converts a user's **natural-language request** into a **formula expression** in the product's formula language, grounded in the live table's schema. You are given an **Airtable API key** (read/write access to a base/table) and an **LLM API key** (e.g., Claude) you can call. Users type requests such as: - “Add 2 to `salary`.” - “If `Status` is "active", set `bonus` to 10% of `salary`.” - “Create a column that is `salary` * 1.1.” - “If `Status` is "Won" and `Amount` > 1000, return `Amount` * 0.1, else 0.” - “Extract the domain from the `Email` field.” - “Label a row "Late" if `Due Date` is before today and `Completed` is false.” Walk through the design covering: 1. **End-to-end architecture.** Components and data flow across the client/UI, an orchestrator API, the LLM layer, the Airtable connector, validators/tools, and observability. 2. **NL → formula generation.** Prompting and tool/structured-output strategy that grounds generation in the actual field names and types. Discuss using a typed intermediate representation (IR) that you compile deterministically into the formula string to cut down syntax errors. 3. **Output contract.** Beyond the proposed formula, return a short **explanation** and optionally a few **test cases / example evaluations** the user can verify. 4. **Schema grounding and type checking.** Pass field names and types (number, currency, date, bool, single-select, text, plus null/blank) into the model; forbid invented fields and functions; require references to match the real schema. 5. **Ambiguity handling.** Ask clarifying questions when the target field is unclear (e.g. `Base Salary` vs `Salary (USD)`), the business logic is ambiguous (what does “late” mean — timezone, inclusivity?), or the output type is unclear. Support multi-turn refinement. 6. **Validation before any write.** Static checks (parse to AST, grammar validation, referenced-field existence, operator/argument type-checking) and dynamic checks (sandbox-evaluate on a small sample of rows to catch divide-by-zero, null/blank handling, runtime errors). On failure, feed the error back to the LLM for a bounded repair loop. 7. **Applying changes to the live sheet via the Airtable API.** Support two execution modes: (a) **create/update a formula field** (the preferred, non-destructive path), and (b) **materialize computed values** into a target field when formula fields aren't allowed. Show a preview/diff on sample rows and require user confirmation before applying to the full dataset. 8. **Safety.** Treat user text as untrusted; never hand the model credentials — the model only requests actions a server-side tool executes. Defend against prompt injection and data exfiltration, redact PII in samples, enforce a capability/operation whitelist, and gate destructive actions (overwriting non-formula fields, bulk updates above a threshold, schema changes) behind explicit confirmation. 9. **Latency and reliability.** Target an interactive P95 of ~2–3s: cache schema summaries by schema hash, use a small/fast model on the first pass and escalate to a larger model on failure, cap repair iterations, and stream the explanation while the formula finalizes. For writes, use idempotency keys, exponential backoff on 429/5xx, batched writes, and partial-failure reporting (failed record IDs). 10. **Evaluation.** Offline: a labeled set of (schema, request) → expected formula scored by AST equivalence and execution accuracy on test rows, plus validation pass-rate. Online: acceptance/insert-without-edit rate, edit distance from suggestion to final, clarification rate, preview-to-apply conversion, turns-to-success, and post-apply undo/correction rate. 11. **Edge cases and failure modes.** Hallucinated fields/functions, wrong operator precedence, null/blank values (`IF({Salary}, {Salary}+2, BLANK())`), type mismatches, locale/percent/currency parsing (“10%” → 0.1), timezone handling, and schema changing between preview and apply (prefer field IDs over names).

Airtable software-engineer onsite ML system design question: design an assistant that turns natural-language requests into the product's formula language, grounded in the live table schema, and optionally applies them to the base via the Airtable API. It tests LLM orchestration with structured outputs / a typed IR, schema grounding and type checking, ambiguity handling, static + dynamic formula validation, prompt-injection safety, confirmation-gated writes, latency trade-offs, and offline/online evaluation.

How do I approach ML System Design interview questions?

ML System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master ml system design interviews.

What difficulty level is this interview question?

This is a medium difficulty ML System Design question, commonly asked during Onsite rounds at Airtable.

What role is this question designed for?

This question is commonly asked for Software Engineer candidates at Airtable during technical interviews.

Design NL-to-Formula assistant for Airtable | Airtable Interview Question

Question

Design an ML-powered assistant for a no-code table product like Airtable that converts a user's natural-language request into a formula expression in the product's formula language, grounded in the live table's schema. You are given an Airtable API key (read/write access to a base/table) and an LLM API key (e.g., Claude) you can call.

Users type requests such as:

“Add 2 to salary .”
“If Status is "active", set bonus to 10% of salary .”
“Create a column that is salary * 1.1.”
“If Status is "Won" and Amount > 1000, return Amount * 0.1, else 0.”
“Extract the domain from the Email field.”
“Label a row "Late" if Due Date is before today and Completed is false.”

Walk through the design covering:

End-to-end architecture. Components and data flow across the client/UI, an orchestrator API, the LLM layer, the Airtable connector, validators/tools, and observability.
NL → formula generation. Prompting and tool/structured-output strategy that grounds generation in the actual field names and types. Discuss using a typed intermediate representation (IR) that you compile deterministically into the formula string to cut down syntax errors.
Output contract. Beyond the proposed formula, return a short explanation and optionally a few test cases / example evaluations the user can verify.
Schema grounding and type checking. Pass field names and types (number, currency, date, bool, single-select, text, plus null/blank) into the model; forbid invented fields and functions; require references to match the real schema.
Ambiguity handling. Ask clarifying questions when the target field is unclear (e.g. Base Salary vs Salary (USD) ), the business logic is ambiguous (what does “late” mean — timezone, inclusivity?), or the output type is unclear. Support multi-turn refinement.
Validation before any write. Static checks (parse to AST, grammar validation, referenced-field existence, operator/argument type-checking) and dynamic checks (sandbox-evaluate on a small sample of rows to catch divide-by-zero, null/blank handling, runtime errors). On failure, feed the error back to the LLM for a bounded repair loop.
Applying changes to the live sheet via the Airtable API. Support two execution modes: (a) create/update a formula field (the preferred, non-destructive path), and (b) materialize computed values into a target field when formula fields aren't allowed. Show a preview/diff on sample rows and require user confirmation before applying to the full dataset.
Safety. Treat user text as untrusted; never hand the model credentials — the model only requests actions a server-side tool executes. Defend against prompt injection and data exfiltration, redact PII in samples, enforce a capability/operation whitelist, and gate destructive actions (overwriting non-formula fields, bulk updates above a threshold, schema changes) behind explicit confirmation.
Latency and reliability. Target an interactive P95 of ~2–3s: cache schema summaries by schema hash, use a small/fast model on the first pass and escalate to a larger model on failure, cap repair iterations, and stream the explanation while the formula finalizes. For writes, use idempotency keys, exponential backoff on 429/5xx, batched writes, and partial-failure reporting (failed record IDs).
Evaluation. Offline: a labeled set of (schema, request) → expected formula scored by AST equivalence and execution accuracy on test rows, plus validation pass-rate. Online: acceptance/insert-without-edit rate, edit distance from suggestion to final, clarification rate, preview-to-apply conversion, turns-to-success, and post-apply undo/correction rate.
Edge cases and failure modes. Hallucinated fields/functions, wrong operator precedence, null/blank values ( IF({Salary}, {Salary}+2, BLANK()) ), type mismatches, locale/percent/currency parsing (“10%” → 0.1), timezone handling, and schema changing between preview and apply (prefer field IDs over names).

Question

Users type requests such as:

“Add 2 to salary .”
“If Status is "active", set bonus to 10% of salary .”
“Create a column that is salary * 1.1.”
“If Status is "Won" and Amount > 1000, return Amount * 0.1, else 0.”
“Extract the domain from the Email field.”
“Label a row "Late" if Due Date is before today and Completed is false.”

Walk through the design covering:

End-to-end architecture. Components and data flow across the client/UI, an orchestrator API, the LLM layer, the Airtable connector, validators/tools, and observability.
NL → formula generation. Prompting and tool/structured-output strategy that grounds generation in the actual field names and types. Discuss using a typed intermediate representation (IR) that you compile deterministically into the formula string to cut down syntax errors.
Output contract. Beyond the proposed formula, return a short explanation and optionally a few test cases / example evaluations the user can verify.
Schema grounding and type checking. Pass field names and types (number, currency, date, bool, single-select, text, plus null/blank) into the model; forbid invented fields and functions; require references to match the real schema.
Ambiguity handling. Ask clarifying questions when the target field is unclear (e.g. Base Salary vs Salary (USD) ), the business logic is ambiguous (what does “late” mean — timezone, inclusivity?), or the output type is unclear. Support multi-turn refinement.
Validation before any write. Static checks (parse to AST, grammar validation, referenced-field existence, operator/argument type-checking) and dynamic checks (sandbox-evaluate on a small sample of rows to catch divide-by-zero, null/blank handling, runtime errors). On failure, feed the error back to the LLM for a bounded repair loop.
Applying changes to the live sheet via the Airtable API. Support two execution modes: (a) create/update a formula field (the preferred, non-destructive path), and (b) materialize computed values into a target field when formula fields aren't allowed. Show a preview/diff on sample rows and require user confirmation before applying to the full dataset.
Safety. Treat user text as untrusted; never hand the model credentials — the model only requests actions a server-side tool executes. Defend against prompt injection and data exfiltration, redact PII in samples, enforce a capability/operation whitelist, and gate destructive actions (overwriting non-formula fields, bulk updates above a threshold, schema changes) behind explicit confirmation.
Latency and reliability. Target an interactive P95 of ~2–3s: cache schema summaries by schema hash, use a small/fast model on the first pass and escalate to a larger model on failure, cap repair iterations, and stream the explanation while the formula finalizes. For writes, use idempotency keys, exponential backoff on 429/5xx, batched writes, and partial-failure reporting (failed record IDs).
Evaluation. Offline: a labeled set of (schema, request) → expected formula scored by AST equivalence and execution accuracy on test rows, plus validation pass-rate. Online: acceptance/insert-without-edit rate, edit distance from suggestion to final, clarification rate, preview-to-apply conversion, turns-to-success, and post-apply undo/correction rate.
Edge cases and failure modes. Hallucinated fields/functions, wrong operator precedence, null/blank values ( IF({Salary}, {Salary}+2, BLANK()) ), type mismatches, locale/percent/currency parsing (“10%” → 0.1), timezone handling, and schema changing between preview and apply (prefer field IDs over names).

Design NL-to-Formula assistant for Airtable

Quick Overview

Design NL-to-Formula assistant for Airtable

Question

Submit Your Answer to Earn 20XP

Design NL-to-Formula assistant for Airtable

Quick Overview

Design NL-to-Formula assistant for Airtable

Question

Submit Your Answer to Earn 20XP