Design a scalable video platform database

Q: How do I practice SQL interview questions?

PracHub provides an interactive SQL console where you can write and test queries against real database schemas. Get instant feedback and compare your solution with the expected output.

Q: What difficulty level is this coding question?

This is a medium difficulty Data Manipulation (SQL/Python) question, commonly asked during Technical Screen rounds at Google.

Q: What role is this question designed for?

This question is commonly asked for Data Scientist candidates at Google during technical interviews.

Question

Design the relational database for a YouTube-like video company. Deliverables: 1) list the core tables with key columns, types, and constraints (users, channels, videos, video_transcodes/qualities, captions, tags, video_tags, views, likes, comments, subscriptions, playlists, playlist_videos, ad_impressions, daily_video_metrics); 2) define primary/foreign keys, uniqueness, and soft-delete and GDPR-compliant deletion strategies; 3) model many-to-many relationships (e.g., videos↔tags, playlists↔videos) and idempotent ingest (avoid duplicate views/likes); 4) include indexing/partitioning (e.g., views partitioned by event_date, video_id; clustered indexes for hot queries), and how you’d support both OLTP and analytics (star schema or read-optimized warehouse tables) without blocking writes; 5) show sample CREATE TABLE DDL for 3–4 critical tables (videos, views, comments, ad_impressions) and explain how you’d query: a) watch-time per video per day, b) top N videos by unique viewers in the last 7 days, c) comments pagination with anti-abuse flags; 6) describe how you’d store multiple renditions (1080p, 4K, HDR) and A/B test assignments for thumbnails.

PracHub · Accepted Answer

This question evaluates relational database design and data engineering competencies—including schema modeling, many-to-many relationships, idempotent ingest, indexing and partitioning, OLTP versus analytics integration, GDPR-compliant deletion strategies, and query formulation—within the Data Manipulation (SQL/Python) domain for a Data Scientist role. It is commonly asked to assess both conceptual understanding and practical application of scalable data architectures, performance tuning, and compliance trade-offs, focusing on the ability to reason about schema choices, read/write optimization, and analytics integration without implementation details.

Quick Overview

Quick Overview