PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/Coding & Algorithms/Amazon

Generate Synthetic Clickstream Data with Python Function

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a candidate's ability to implement synthetic data generation and probabilistic event simulation in Python using libraries like pandas and numpy, emphasizing timestamp handling, random sampling, and feature engineering for clickstream records.

  • Medium
  • Amazon
  • Coding & Algorithms
  • Data Scientist

Generate Synthetic Clickstream Data with Python Function

Company: Amazon

Role: Data Scientist

Category: Coding & Algorithms

Difficulty: Medium

Interview Round: Technical Screen

##### Scenario The analytics team needs to generate synthetic click-stream records to test a new reporting pipeline before real traffic arrives. ##### Question Write a Python function simulate_clickstream(num_users: int, days: int) that returns a Pandas DataFrame of simulated events with columns [user_id, event_ts, page, clicked]. Events should be timestamped within the past <days> days, each user should visit 1–10 random pages per day, and click probability is 0.15. ##### Hints Use numpy.random for page counts and probabilities; build a list of dicts, then convert to DataFrame.

Quick Answer: This question evaluates a candidate's ability to implement synthetic data generation and probabilistic event simulation in Python using libraries like pandas and numpy, emphasizing timestamp handling, random sampling, and feature engineering for clickstream records.

Related Interview Questions

  • Implement Datacenter Router Commands - Amazon (hard)
  • Implement Event Filtering and Queue Routing - Amazon (medium)
  • Determine if all courses can be completed - Amazon (medium)
  • Replace Delimited Tokens in a String - Amazon (medium)
  • Minimize Circular Redistribution Cost - Amazon (medium)
Amazon logo
Amazon
Aug 4, 2025, 10:55 AM
Data Scientist
Technical Screen
Coding & Algorithms
199
0
Scenario

The analytics team needs to generate synthetic click-stream records to test a new reporting pipeline before real traffic arrives.

Question

Write a Python function simulate_clickstream(num_users: int, days: int) that returns a Pandas DataFrame of simulated events with columns [user_id, event_ts, page, clicked]. Events should be timestamped within the past <days> days, each user should visit 1–10 random pages per day, and click probability is 0.15.

Hints

Use numpy.random for page counts and probabilities; build a list of dicts, then convert to DataFrame.

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More Coding & Algorithms•More Amazon•More Data Scientist•Amazon Data Scientist•Amazon Coding & Algorithms•Data Scientist Coding & Algorithms
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.