You are given event data about users viewing design files.
Assume the following tables:
-
files(file_id, creator_id)
-
creator_id
is a VARCHAR user identifier.
-
file_views(file_id, user_id, viewed_at)
-
user_id
is an INTEGER user identifier.
-
Each row means
user_id
viewed
file_id
at time
viewed_at
.
A “share” is inferred when someone other than the creator views a file.
-
First share time per creator
Write a SQL query that returns, for each creator, the first time they ever shared a file they created (i.e., the earliest viewed_at where a non-creator viewed one of their files).
Output columns:
-
creator_id
-
first_shared_at
Notes:
-
A creator viewing their own file should not count.
-
creator_id
(VARCHAR) and
user_id
(INTEGER) are different types; handle casting appropriately.
-
Closest collaborator model
For each file, define the “first person it was shared with” as the earliest non-creator viewer of that file (based on viewed_at).
Build a query/model that, for each creator, finds their closest collaborator, defined as the user who was the “first person shared with” for the creator’s files the most times.
Output columns:
-
creator_id
-
closest_collaborator_user_id
-
collaborator_value
(the number of files for which this collaborator was the first non-creator viewer)
Tie-breaking: if multiple collaborators have the same collaborator_value, choose the one with the earliest first-share timestamp across those tied records (and if still tied, the smallest user id).