PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/Data Manipulation (SQL/Python)/Notion

Analyze Document Collaboration CSVs

Last updated: Apr 2, 2026

Quick Overview

This question evaluates data manipulation and analytical skills, including joining user and activity records, aggregating and deduplicating events, and computing collaboration metrics across CSV datasets.

  • medium
  • Notion
  • Data Manipulation (SQL/Python)
  • Data Scientist

Analyze Document Collaboration CSVs

Company: Notion

Role: Data Scientist

Category: Data Manipulation (SQL/Python)

Difficulty: medium

Interview Round: Technical Screen

You are given two CSV files. `document_activity.csv` - `page_id` STRING — document identifier - `user_id` STRING — user who viewed or worked on the document - `viewed_at` TIMESTAMP — event time in UTC - `is_creator` BOOLEAN — `true` if the user created the document - `collaborator_source` STRING NULL — for non-creators only, the channel by which the collaborator reached the document (for example: invite, shared link, search). This field is `NULL` for creators. `users.csv` - `user_id` STRING — user identifier - `country` STRING — user's country Assumptions: - `document_activity.user_id` joins to `users.user_id`. - Each row represents one user-page event. - A document is considered collaborative if at least two distinct users appear on the same `page_id`. - No date filtering is required; use the full dataset. - Treat all timestamps as UTC. Tasks: 1. **Distribution of collaborator acquisition channels** - Consider only rows where `is_creator = false`. - Compute the distribution of `collaborator_source`. - Return: `collaborator_source`, `collaborator_count`, `collaborator_share`. - Define `collaborator_share` as `collaborator_count / total_non_creator_rows`. 2. **Find who collaborated most with a given user** - Given an input `user_id`, find the other user or users who collaborated most often with that user. - Define collaboration as appearing on the same `page_id`. - Count distinct shared `page_id` values, not raw events. - Return: `input_user_id`, `other_user_id`, `shared_page_count`. - If multiple users tie for the maximum, return all tied users ordered by `other_user_id`. 3. **Identify which country has the highest collaboration rate** - Join to `users.csv` to get each user's country. - For each country, define: - `total_active_users` = number of distinct users from that country who appear in `document_activity` - `collaborating_users` = number of those users who appear on at least one collaborative document - `collaboration_rate` = `collaborating_users / total_active_users` - Return: `country`, `collaborating_users`, `total_active_users`, `collaboration_rate`. - Rank countries by `collaboration_rate` descending.

Quick Answer: This question evaluates data manipulation and analytical skills, including joining user and activity records, aggregating and deduplicating events, and computing collaboration metrics across CSV datasets.

Related Interview Questions

  • Analyze document collaboration from CSV logs - Notion (easy)
  • Analyze document collaboration patterns - Notion (medium)
Notion logo
Notion
Jul 1, 2025, 12:00 AM
Data Scientist
Technical Screen
Data Manipulation (SQL/Python)
4
0
Loading...

You are given two CSV files.

document_activity.csv

  • page_id STRING — document identifier
  • user_id STRING — user who viewed or worked on the document
  • viewed_at TIMESTAMP — event time in UTC
  • is_creator BOOLEAN — true if the user created the document
  • collaborator_source STRING NULL — for non-creators only, the channel by which the collaborator reached the document (for example: invite, shared link, search). This field is NULL for creators.

users.csv

  • user_id STRING — user identifier
  • country STRING — user's country

Assumptions:

  • document_activity.user_id joins to users.user_id .
  • Each row represents one user-page event.
  • A document is considered collaborative if at least two distinct users appear on the same page_id .
  • No date filtering is required; use the full dataset.
  • Treat all timestamps as UTC.

Tasks:

  1. Distribution of collaborator acquisition channels
    • Consider only rows where is_creator = false .
    • Compute the distribution of collaborator_source .
    • Return: collaborator_source , collaborator_count , collaborator_share .
    • Define collaborator_share as collaborator_count / total_non_creator_rows .
  2. Find who collaborated most with a given user
    • Given an input user_id , find the other user or users who collaborated most often with that user.
    • Define collaboration as appearing on the same page_id .
    • Count distinct shared page_id values, not raw events.
    • Return: input_user_id , other_user_id , shared_page_count .
    • If multiple users tie for the maximum, return all tied users ordered by other_user_id .
  3. Identify which country has the highest collaboration rate
    • Join to users.csv to get each user's country.
    • For each country, define:
      • total_active_users = number of distinct users from that country who appear in document_activity
      • collaborating_users = number of those users who appear on at least one collaborative document
      • collaboration_rate = collaborating_users / total_active_users
    • Return: country , collaborating_users , total_active_users , collaboration_rate .
    • Rank countries by collaboration_rate descending.

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More Data Manipulation (SQL/Python)•More Notion•More Data Scientist•Notion Data Scientist•Notion Data Manipulation (SQL/Python)•Data Scientist Data Manipulation (SQL/Python)
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.