This question evaluates a candidate's ability to implement synthetic data generation and probabilistic event simulation in Python using libraries like pandas and numpy, emphasizing timestamp handling, random sampling, and feature engineering for clickstream records.
The analytics team needs to generate synthetic click-stream records to test a new reporting pipeline before real traffic arrives.
Write a Python function simulate_clickstream(num_users: int, days: int) that returns a Pandas DataFrame of simulated events with columns [user_id, event_ts, page, clicked]. Events should be timestamped within the past <days> days, each user should visit 1–10 random pages per day, and click probability is 0.15.
Use numpy.random for page counts and probabilities; build a list of dicts, then convert to DataFrame.