Implement R dplyr simulation and left join

Q: Implement R dplyr simulation and left join

This is a Data Manipulation (SQL/Python) interview question from Google for Data Scientist roles. View the full question and solution on PracHub.

Q: How do I approach Data Manipulation (SQL/Python) interview questions?

Data Manipulation (SQL/Python) questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master data manipulation (sql/python) interviews.

Question

Using R and dplyr, run a simulation and a join. Data: prices item_id | price_usd 1 | 10.00 2 | 20.00 3 | 30.00 4 | 40.00 catalog item_id | category 1 | A 2 | B 3 | A 4 | C Tasks:

With set.seed(2025), perform 1,000 simulations. In each simulation: randomly select half of the rows in prices to keep the same price; increase the other half by 10%. Then left join to catalog and compute: (i) overall mean price, and (ii) mean price by category A/B/C.
Return a data frame with one row per simulation containing overall_mean and category means. Also return the empirical mean and SD across simulations for each statistic.
Constraints: use dplyr verbs (e.g., slice_sample, mutate, case_when, left_join, group_by, summarise). Avoid for-loops; use vectorized operations or map-style iteration while ensuring no accidental reuse of mutated state across iterations. Your solution must be memory-safe for 1e6 items (outline changes needed).

Implement R dplyr simulation and left join

Comments (0)