Expense Rules Engines

What's being tested

A strong answer shows you can design a policy-driven backend system where expense decisions are configurable, explainable, versioned, and scalable across many companies. Rippling cares because expense policies are customer-specific, change frequently, and affect money movement, employee experience, compliance, and auditability. The interviewer is probing whether you can move beyond hardcoded if/else logic into a flexible rules engine with clear data models, deterministic evaluation, conflict resolution, immutable history, and aggregate calculations like trip-level or category-level caps. They will also test whether you can design APIs and return types that are backward-compatible and useful to product surfaces, approvals, and audits.

Core knowledge

Rule representation is the center of the design. A good model separates condition, scope, action, priority, effective_time, and version. Example: “For employees in US, meal expenses over $75 require manager approval” should be data/config, not deployed code.
DSL versus configuration is a key tradeoff. A JSON config such as { field: "category", op: "eq", value: "meal" } is safer and easier to validate, while a full domain-specific language is more expressive but harder to secure, test, migrate, and explain.
Rule evaluation should usually be deterministic and side-effect-free. Given an expense, policy version, company, employee attributes, and relevant aggregates, evaluate(input) -> result should always return the same output. This enables replay, debugging, audit trails, and idempotent retries.
Return types matter as much as rule matching. A strong response includes decision, violations, required_actions, reimbursable_amount, explanations, matched_rule_ids, and policy_version. Avoid returning only true/false; product and support teams need to know why a claim failed or needs review.
Conflict resolution must be explicit. If one rule says “auto-approve under $100” and another says “alcohol is non-reimbursable,” define precedence using priority, deny-over-approve semantics, specificity, or ordered evaluation. For financial systems, conservative defaults like “hard violation beats auto-approval” are easier to defend.
Aggregate rules require grouping and snapshot semantics. Trip-level limits need records grouped by keys like (company_id, employee_id, trip_id, category) and time windows. For example, compute total_meals_for_trip = sum(amount where category = meal) before applying “meals over $300 per trip require approval.”
Performance depends on candidate rule selection before evaluation. Do not scan every company’s rules. Partition by tenant_id, filter by effective date, employee country, expense category, and status, then evaluate a small candidate set. Complexity should trend toward $O(R_c + A)$ , where $R_c$ is candidate rules and $A$ is relevant aggregate records.
Versioning and immutability are non-negotiable. Store policy versions as immutable records, e.g. policy_id, version, effective_from, created_by, created_at. An expense submitted on Monday should be evaluated under the policy active on Monday, even if the company changes its rules on Tuesday.
Explainability should be first-class. Persist an evaluation trace containing matched rules, failed predicates, input facts, aggregate values, and final decision. This helps customer support answer, “Why was this rejected?” and helps engineers replay production bugs without guessing.
Multi-tenancy affects data access and caching. Rules should be keyed by company_id or tenant, with strong isolation in queries and cache keys. A cache like Redis can hold active policy versions, but cache invalidation must respect publication events, effective dates, and policy version IDs.
API design should support both synchronous and asynchronous paths. A single expense swipe may need low-latency evaluation, while bulk reimbursement or backfills can run asynchronously. Expose an endpoint like POST /expense-evaluations with idempotency keys and return evaluation_id, decision, policy_version, and explanations.
Testing strategy should include golden cases and property-like checks. Store fixtures such as “meal under limit,” “trip total exceeds cap,” and “policy changed after submission.” For a rules engine, regression tests are often more important than unit tests because customers depend on exact historical behavior.

Worked example

For “Design expense rules engine and return type,” a strong candidate starts by clarifying whether the engine is evaluating corporate card swipes, reimbursement submissions, or both; whether decisions must be real-time; and whether rules are per-company, per-employee group, or global templates. They should state assumptions: multi-tenant SaaS, company-specific policies, expenses have fields like amount, currency, merchant, category, employee_id, submitted_at, and some rules need aggregate context.

The answer can be organized around four pillars: data model, evaluation flow, response contract, and operational concerns. For the data model, define PolicyVersion, Rule, Predicate, Action, and EvaluationRecord; each rule has scope, condition tree, effect, priority, and effective dates. For evaluation, load the policy version for (company_id, submitted_at), gather expense facts and aggregate facts, evaluate predicates deterministically, resolve conflicts, and produce a final decision. For the return type, include decision = APPROVED | NEEDS_REVIEW | REJECTED, violations, warnings, required_approvals, reimbursable_amount, matched_rules, and human-readable explanations.

One design decision to flag explicitly is whether to implement a custom JSON rule format or embed a general expression language. A constrained JSON AST is less expressive but safer: it is easier to validate, migrate, index, explain, and expose in an admin UI. The candidate can close by saying that, with more time, they would cover policy publishing workflows, audit permissions, bulk re-evaluation, and how to simulate the impact of a draft policy before activating it.

A second angle

For “Extend rules for trip-level aggregates and outputs,” the same engine design applies, but the hard part shifts from single-record predicates to aggregate fact computation. Instead of evaluating only expense.amount > 75, the engine may need sum(meal.amount) for trip_id = X, count(hotel_nights), or max(daily_transport_total). A good design introduces a fact provider abstraction: the rule engine asks for named facts like trip.meal_total, while a separate aggregation layer computes them from expense records. The candidate should be careful about timing: do aggregates include pending expenses, rejected expenses, card authorizations, or only submitted claims? The output also becomes richer because the employee needs to know not just “violated trip cap,” but “trip meal total is $340; policy limit is $300; this expense contributes $55.”

Common pitfalls

Pitfall: Treating the system as a chain of hardcoded if/else statements.

This may work for three policies but fails when every customer has different limits, exceptions, approval paths, and effective dates. A better answer defines a configurable rule model, a deterministic evaluator, and a versioned publishing workflow.

Pitfall: Ignoring historical correctness.

A tempting but wrong design always reads the latest company policy at evaluation time. That breaks audits and creates inconsistent outcomes after policy edits; instead, persist policy_version on each evaluation and make policies immutable once published.

Pitfall: Returning only a binary approval result.

A boolean response forces every downstream system to reverse-engineer intent. Strong answers model decisions, violations, required actions, and explanations separately so UI, approvals, accounting, and support can consume the same evaluation safely.

Connections

The interviewer may pivot from this topic into workflow orchestration for approvals, idempotency for reimbursement submissions, ledger design for money movement, or schema evolution for backward-compatible APIs. They may also ask for a coding-oriented version, such as aggregating expenses by person, trip, and category using hash maps with $O(n)$ time and $O(k)$ space.

What's being tested

Core knowledge

Worked example

A second angle

Common pitfalls

Connections

Further reading

Featured in interview prep guides

Practice questions

Related concepts