Write SQL for character safety and engagement metrics
Company: Character
Role: Data Scientist
Category: Data Manipulation (SQL/Python)
Difficulty: Medium
Interview Round: Technical Screen
You are working with a conversational product that has **users**, **AI characters**, and **conversations**. Write SQL queries for each of the tasks below.
Assume **UTC** timestamps and that `day` means `DATE(created_at)`.
## Tables
### `users`
- `user_id` BIGINT PRIMARY KEY
- `created_at` TIMESTAMP
### `characters`
- `character_id` BIGINT PRIMARY KEY
- `created_by_user_id` BIGINT REFERENCES `users(user_id)`
- `created_at` TIMESTAMP
- `safety_flag` BOOLEAN
- `TRUE` = safe, `FALSE` = unsafe
### `conversations`
Each row represents a user engaging in a conversation associated with a character.
- `conversation_id` BIGINT
- `character_id` BIGINT REFERENCES `characters(character_id)`
- `user_id` BIGINT REFERENCES `users(user_id)`
- `created_at` TIMESTAMP
- `safety_flag` BOOLEAN
- `TRUE` = safe, `FALSE` = unsafe
Notes:
- A single `conversation_id` may appear multiple times if multiple users engaged.
---
## Task 1: Find the “top 100 characters”
Define “top” as the characters with the **largest number of conversation engagements** (rows in `conversations`) across all time.
Return:
- `character_id`
- `engagement_count`
Order by `engagement_count` DESC and return only the top 100.
---
## Task 2: Compute the unsafe character ratio
Compute the overall ratio of **unsafe characters** among all characters.
Return:
- `unsafe_character_ratio` (a decimal between 0 and 1)
Where:
- unsafe character = `characters.safety_flag = FALSE`
---
## Task 3: 7-day moving average of daily unsafe-character percentage
For each `day` (based on `characters.created_at`), compute:
1) `daily_unsafe_pct` = (# characters created that day that are unsafe) / (total # characters created that day)
2) A **7-day trailing moving average** of `daily_unsafe_pct` (including the current day).
Return:
- `day`
- `daily_unsafe_pct`
- `daily_unsafe_pct_ma7`
Order by `day` ascending.
---
## Task 4 (open-ended, but answer with SQL outputs)
### 4a) Relationship between unsafe characters and unsafe conversations
Create a 2×2 breakdown of conversation engagements by:
- character safety (`characters.safety_flag`)
- conversation safety (`conversations.safety_flag`)
Return:
- `character_safety_flag`
- `conversation_safety_flag`
- `engagement_count`
### 4b) By day, unsafe-user engagement in unsafe conversations
For each `day` (based on `conversations.created_at`), compute metrics related to **users who engaged in unsafe conversations** (`conversations.safety_flag = FALSE`).
Return:
- `day`
- `unsafe_conversation_engagements` (count of rows where `conversations.safety_flag = FALSE`)
- `unsafe_users` (distinct users engaging in unsafe conversations)
- `total_engagements` (all conversation rows that day)
- `total_users` (distinct users that day)
- `unsafe_engagement_ratio` = `unsafe_conversation_engagements / total_engagements`
- `unsafe_user_ratio` = `unsafe_users / total_users`
Order by `day` ascending.
Quick Answer: This question evaluates proficiency in SQL-based data manipulation and analytics, specifically aggregations, joins, date-based grouping, ratio calculations, and windowed moving averages to compute safety and engagement metrics.