Design an LLM-Based Arithmetic Solver
Context
You are building an LLM-driven service that answers arithmetic questions ranging from simple expressions (e.g., 4 + 5) to queries like compute the sum from 1 to 100. The system must choose among answering directly, applying closed-form formulas, or executing code in a sandbox.
Requirements
-
Decision policy: when to answer directly vs use formulas vs run code (e.g., calculator, Python). Explain latency and accuracy trade-offs.
-
Chain logic: orchestration steps, routing, verification, and fallbacks.
-
Tool usage: which tools are available, how they are called, and how results are validated.
-
Prompting strategy: router prompts, solver prompts, output schema, and how to avoid leaking chain-of-thought.
-
Guardrails: safety, precision, ambiguity handling, timeouts, and injection defenses.
-
Evaluation: how to measure accuracy, test coverage, and reliability; how to handle errors and retries.
-
Experiment tracking: how you will version datasets, prompts, policies, and record metrics.
Provide concrete examples using 4 + 5 and sum from 1 to 100.