Design NL-to-Formula assistant for Airtable
Company: Airtable
Role: Software Engineer
Category: ML System Design
Difficulty: medium
Interview Round: Onsite
##### Question
Design an ML-powered assistant for a no-code table product like Airtable that converts a user's **natural-language request** into a **formula expression** in the product's formula language, grounded in the live table's schema. You are given an **Airtable API key** (read/write access to a base/table) and an **LLM API key** (e.g., Claude) you can call.
Users type requests such as:
- “Add 2 to `salary`.”
- “If `Status` is "active", set `bonus` to 10% of `salary`.”
- “Create a column that is `salary` * 1.1.”
- “If `Status` is "Won" and `Amount` > 1000, return `Amount` * 0.1, else 0.”
- “Extract the domain from the `Email` field.”
- “Label a row "Late" if `Due Date` is before today and `Completed` is false.”
Walk through the design covering:
1. **End-to-end architecture.** Components and data flow across the client/UI, an orchestrator API, the LLM layer, the Airtable connector, validators/tools, and observability.
2. **NL → formula generation.** Prompting and tool/structured-output strategy that grounds generation in the actual field names and types. Discuss using a typed intermediate representation (IR) that you compile deterministically into the formula string to cut down syntax errors.
3. **Output contract.** Beyond the proposed formula, return a short **explanation** and optionally a few **test cases / example evaluations** the user can verify.
4. **Schema grounding and type checking.** Pass field names and types (number, currency, date, bool, single-select, text, plus null/blank) into the model; forbid invented fields and functions; require references to match the real schema.
5. **Ambiguity handling.** Ask clarifying questions when the target field is unclear (e.g. `Base Salary` vs `Salary (USD)`), the business logic is ambiguous (what does “late” mean — timezone, inclusivity?), or the output type is unclear. Support multi-turn refinement.
6. **Validation before any write.** Static checks (parse to AST, grammar validation, referenced-field existence, operator/argument type-checking) and dynamic checks (sandbox-evaluate on a small sample of rows to catch divide-by-zero, null/blank handling, runtime errors). On failure, feed the error back to the LLM for a bounded repair loop.
7. **Applying changes to the live sheet via the Airtable API.** Support two execution modes: (a) **create/update a formula field** (the preferred, non-destructive path), and (b) **materialize computed values** into a target field when formula fields aren't allowed. Show a preview/diff on sample rows and require user confirmation before applying to the full dataset.
8. **Safety.** Treat user text as untrusted; never hand the model credentials — the model only requests actions a server-side tool executes. Defend against prompt injection and data exfiltration, redact PII in samples, enforce a capability/operation whitelist, and gate destructive actions (overwriting non-formula fields, bulk updates above a threshold, schema changes) behind explicit confirmation.
9. **Latency and reliability.** Target an interactive P95 of ~2–3s: cache schema summaries by schema hash, use a small/fast model on the first pass and escalate to a larger model on failure, cap repair iterations, and stream the explanation while the formula finalizes. For writes, use idempotency keys, exponential backoff on 429/5xx, batched writes, and partial-failure reporting (failed record IDs).
10. **Evaluation.** Offline: a labeled set of (schema, request) → expected formula scored by AST equivalence and execution accuracy on test rows, plus validation pass-rate. Online: acceptance/insert-without-edit rate, edit distance from suggestion to final, clarification rate, preview-to-apply conversion, turns-to-success, and post-apply undo/correction rate.
11. **Edge cases and failure modes.** Hallucinated fields/functions, wrong operator precedence, null/blank values (`IF({Salary}, {Salary}+2, BLANK())`), type mismatches, locale/percent/currency parsing (“10%” → 0.1), timezone handling, and schema changing between preview and apply (prefer field IDs over names).
Quick Answer: Airtable software-engineer onsite ML system design question: design an assistant that turns natural-language requests into the product's formula language, grounded in the live table schema, and optionally applies them to the base via the Airtable API. It tests LLM orchestration with structured outputs / a typed IR, schema grounding and type checking, ambiguity handling, static + dynamic formula validation, prompt-injection safety, confirmation-gated writes, latency trade-offs, and offline/online evaluation.