PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches

Quick Overview

This question evaluates data manipulation and analytical competencies—particularly deduplication, joins, aggregation, co-occurrence counting, and metric definition—applied to CSV access logs and user metadata.

  • easy
  • Notion
  • Data Manipulation (SQL/Python)
  • Data Scientist

Analyze document collaboration from CSV logs

Company: Notion

Role: Data Scientist

Category: Data Manipulation (SQL/Python)

Difficulty: easy

Interview Round: Technical Screen

You are given two CSV files exported from a document collaboration product. ## Data ### 1) `doc_access.csv` Each row represents a user accessing (viewing/editing) a document page. | column | type | description | |---|---|---| | `page_id` | STRING/INT | Document/page identifier | | `user_id` | STRING/INT | User who accessed the document | | `event_ts` | TIMESTAMP | When the access happened (assume UTC) | | `is_creator` | BOOLEAN | `TRUE` if this user is the creator/owner of the document; otherwise the user is a collaborator | | `collab_source` | STRING | For collaborators only (`is_creator=FALSE`): how the user became a collaborator (e.g., invited, link share, org default, etc.) | Notes/assumptions: - A “collaborator” is any record with `is_creator = FALSE`. - A user may access the same `page_id` multiple times; unless otherwise specified, treat collaboration at the **document level** (i.e., dedupe by `(page_id, user_id)` where appropriate). ### 2) `users.csv` | column | type | description | |---|---|---| | `user_id` | STRING/INT | User identifier | | `country` | STRING | User’s country | ## Tasks ### Q1) Distribution of collaborator sources Compute the distribution of how collaborators got onto documents. **Output:** - `collab_source` - `num_collaborators` (count of collaborator users; specify whether you count unique users overall or unique `(page_id, user_id)` pairs) - `pct_of_collaborators` ### Q2) For a given user, who did they collaborate with most? Given an input parameter `target_user_id`, find the other user they collaborated with the most. Define “collaborated with” as: the two users both appear on the same `page_id` (after deduping to unique `(page_id, user_id)` memberships). **Output:** - `target_user_id` - `top_collaborator_user_id` - `num_shared_pages` (number of distinct `page_id` where both users appear) Also describe how you would break ties (e.g., smallest user_id, most recent shared collaboration, etc.). ### Q3) Which country collaborates the most? Determine which country’s users are most likely to collaborate with others. Because “collaboration rate” can be defined multiple ways, do the following: 1. Propose a clear numerator and denominator for a per-country collaboration rate. - Example definition (acceptable): - Numerator: # of users in the country who have collaborated with at least one other distinct user on at least one page. - Denominator: total # of users in the country. 2. Compute the collaboration rate by country and return the top country. **Output:** - `country` - `collaboration_rate` - any intermediate counts you used (e.g., `num_users_in_country`, `num_users_who_collaborated`).

Quick Answer: This question evaluates data manipulation and analytical competencies—particularly deduplication, joins, aggregation, co-occurrence counting, and metric definition—applied to CSV access logs and user metadata.

Last updated: Mar 29, 2026

Related Coding Questions

  • Analyze document collaboration patterns - Notion (medium)
  • Analyze Document Collaboration CSVs - Notion (medium)

Loading coding console...

PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.