Using R and dplyr, run a simulation and a join.
Data:
prices
item_id | price_usd
1 | 10.00
2 | 20.00
3 | 30.00
4 | 40.00
catalog
item_id | category
1 | A
2 | B
3 | A
4 | C
Tasks:
-
With set.seed(2025), perform 1,000 simulations. In each simulation: randomly select half of the rows in prices to keep the same price; increase the other half by 10%. Then left join to catalog and compute: (i) overall mean price, and (ii) mean price by category A/B/C.
-
Return a data frame with one row per simulation containing overall_mean and category means. Also return the empirical mean and SD across simulations for each statistic.
-
Constraints: use dplyr verbs (e.g., slice_sample, mutate, case_when, left_join, group_by, summarise). Avoid for-loops; use vectorized operations or map-style iteration while ensuring no accidental reuse of mutated state across iterations. Your solution must be memory-safe for 1e6 items (outline changes needed).