PracHub
QuestionsPremiumLearningGuidesCheatsheetNEW

Quick Overview

This question evaluates SQL proficiency and data engineering competencies, including deduplication, joins, time-series aggregation, rolling-window calculations, calendar generation, and URL query-parameter extraction.

  • easy
  • Shopify
  • Coding & Algorithms
  • Data Engineer

Write SQL for session analytics

Company: Shopify

Role: Data Engineer

Category: Coding & Algorithms

Difficulty: easy

Interview Round: Technical Screen

You are given two tables for an e-commerce product. ## Tables ### 1) `shops` A shop dimension table that contains **duplicate rows for the same `shop_id`**. Columns: - `shop_id` (INT) - `shop_name` (STRING) - `created_at` (TIMESTAMP) Notes: - `shops` may contain multiple rows per `shop_id`. ### 2) `sessions` A session fact table. Columns: - `session_id` (STRING) - `shop_id` (INT) - `started_at` (TIMESTAMP) - `url` (STRING) Assume: - A “daily” metric is based on `DATE(started_at)`. - You may assume a reasonable reporting window exists in the data (or you can derive it from `sessions`). ## Tasks (write one SQL query per task) 1) **Daily total sessions** - Return one row per day with the total number of sessions. - Output: `day`, `session_cnt`. 2) **Daily sessions by shop** - Return one row per day per shop. - Because `shops` has duplicates, you must **deduplicate it before joining**. - Output: `day`, `shop_id`, `shop_name`, `session_cnt`. 3) **Daily sessions by shop with a 7-day rolling average (excluding today)** - For each shop and each calendar day in the reporting window, return: - `day` - `shop_id` - `session_cnt` (0 if there were no sessions that day) - `rolling_7d_avg_excl_today`: average of the previous 7 days’ `session_cnt`, i.e. days `[day-7, day-1]`. - Requirements: - You must **generate a calendar** so that missing days appear. - Days with no sessions must be treated as **0** and included in the rolling average. 4) **Extract information from a URL** - From `sessions.url`, extract the value of the query parameter `utm_campaign`. - Examples: - `https://a.com/?utm_source=google&utm_campaign=summer_sale` → `summer_sale` - `https://a.com/path?x=1` → `NULL` - Output: `session_id`, `utm_campaign`. Use standard SQL (any common dialect is fine), and ensure each query can be run independently.

Quick Answer: This question evaluates SQL proficiency and data engineering competencies, including deduplication, joins, time-series aggregation, rolling-window calculations, calendar generation, and URL query-parameter extraction.

Part 1: Daily Total Sessions

Given a list of session records, each record is a tuple `(session_id, shop_id, started_at, url)`. Count how many sessions happened on each calendar day, where the day is the `YYYY-MM-DD` part of `started_at`. Return one row per day sorted by day ascending.

Constraints

  • 0 <= len(sessions) <= 200000
  • `started_at` is always a valid timestamp string in `YYYY-MM-DD HH:MM:SS` format
  • Multiple sessions may occur on the same day

Examples

Input: ([('s1', 1, '2024-01-01 09:00:00', '/a'), ('s2', 1, '2024-01-01 10:15:00', '/b'), ('s3', 2, '2024-01-02 08:00:00', '/c')],)

Expected Output: [('2024-01-01', 2), ('2024-01-02', 1)]

Explanation: Two sessions fall on 2024-01-01 and one on 2024-01-02.

Input: ([('s1', 2, '2024-03-10 23:59:59', '/'), ('s2', 2, '2024-03-09 00:00:00', '/'), ('s3', 3, '2024-03-10 00:00:00', '/')],)

Expected Output: [('2024-03-09', 1), ('2024-03-10', 2)]

Explanation: The input is unsorted, but the result must be sorted by day.

Input: ([('only', 7, '2024-05-05 12:30:00', '/x')],)

Expected Output: [('2024-05-05', 1)]

Explanation: Single-session edge case.

Input: ([],)

Expected Output: []

Explanation: No sessions means no output rows.

Hints

  1. You only need the date part of `started_at`, not the time.
  2. A hash map keyed by day is enough to build the counts efficiently.

Part 2: Daily Sessions by Shop

You are given a shop dimension table and a session table. The `shops` table may contain duplicate rows for the same `shop_id`. Before joining, deduplicate `shops` by keeping the row with the earliest `created_at`; if there is still a tie, keep the lexicographically smallest `shop_name`. Then count sessions per calendar day and `shop_id`, and attach the deduplicated `shop_name`. If a session references a `shop_id` not present in `shops`, use `None` for `shop_name`. Return the result sorted by `day`, then `shop_id`.

Constraints

  • 0 <= len(shops), len(sessions) <= 200000
  • `created_at` and `started_at` are valid ISO-like timestamps in `YYYY-MM-DD HH:MM:SS` format
  • Use the deduplication rule exactly as stated before counting or joining

Examples

Input: ([(1, 'Alpha', '2023-01-02 00:00:00'), (1, 'Alpha Old', '2023-01-01 00:00:00'), (2, 'Beta', '2023-01-01 00:00:00')], [('s1', 1, '2024-02-01 09:00:00', '/'), ('s2', 1, '2024-02-01 11:00:00', '/'), ('s3', 2, '2024-02-01 12:00:00', '/'), ('s4', 1, '2024-02-02 10:00:00', '/')])

Expected Output: [('2024-02-01', 1, 'Alpha Old', 2), ('2024-02-01', 2, 'Beta', 1), ('2024-02-02', 1, 'Alpha Old', 1)]

Explanation: For `shop_id = 1`, the earliest `created_at` row is kept, so the attached name is `Alpha Old`.

Input: ([(1, 'Zoo', '2023-01-01 00:00:00'), (1, 'Ace', '2023-01-01 00:00:00')], [('s1', 1, '2024-01-01 00:00:00', '/')])

Expected Output: [('2024-01-01', 1, 'Ace', 1)]

Explanation: The timestamps tie, so the lexicographically smaller name `Ace` wins.

Input: ([(1, 'Only Shop', '2023-01-01 00:00:00')], [('s1', 2, '2024-01-01 10:00:00', '/')])

Expected Output: [('2024-01-01', 2, None, 1)]

Explanation: A session can reference a missing shop; keep the row and use `None` as the name.

Input: ([(1, 'A', '2023-01-01 00:00:00')], [])

Expected Output: []

Explanation: No sessions means no daily shop counts.

Hints

  1. Solve this in two phases: first build one canonical row per `shop_id`, then aggregate sessions by `(day, shop_id)`.
  2. Because the timestamp format is fixed, comparing `created_at` strings lexicographically gives the same order as comparing actual timestamps.

Part 3: Daily Sessions by Shop with 7-Day Rolling Average Excluding Today

Given session records as tuples `(session_id, shop_id, started_at, url)`, derive the reporting window from the minimum session day to the maximum session day. For every distinct `shop_id` in the input and every day in that window, return the day's session count and the 7-day rolling average excluding the current day. Missing days must appear with `session_cnt = 0`. The rolling average is always computed over exactly 7 calendar days, so days before the window starts count as 0. Round the rolling average to 6 decimal places. Return rows sorted by `day`, then `shop_id`.

Constraints

  • 0 <= len(sessions) <= 100000
  • `started_at` is always a valid timestamp string in `YYYY-MM-DD HH:MM:SS` format
  • The product of distinct shop count and number of days in the reporting window does not exceed 200000
  • Round each rolling average to 6 decimal places

Examples

Input: ([('a', 1, '2024-01-01 09:00:00', '/'), ('b', 1, '2024-01-03 10:00:00', '/'), ('c', 1, '2024-01-03 12:00:00', '/'), ('d', 2, '2024-01-02 08:00:00', '/')],)

Expected Output: [('2024-01-01', 1, 1, 0.0), ('2024-01-01', 2, 0, 0.0), ('2024-01-02', 1, 0, 0.142857), ('2024-01-02', 2, 1, 0.0), ('2024-01-03', 1, 2, 0.142857), ('2024-01-03', 2, 0, 0.142857)]

Explanation: The calendar spans 2024-01-01 through 2024-01-03, and missing shop/day combinations must be filled with 0.

Input: ([('x', 5, '2024-05-01 00:00:00', '/')],)

Expected Output: [('2024-05-01', 5, 1, 0.0)]

Explanation: On the first day there are no previous days inside the 7-day lookback, so the average is 0.

Input: ([('s1', 1, '2024-01-01 00:00:00', '/'), ('s2', 1, '2024-01-02 00:00:00', '/'), ('s3', 1, '2024-01-03 00:00:00', '/'), ('s4', 1, '2024-01-04 00:00:00', '/'), ('s5', 1, '2024-01-05 00:00:00', '/'), ('s6', 1, '2024-01-06 00:00:00', '/'), ('s7', 1, '2024-01-07 00:00:00', '/'), ('s8', 1, '2024-01-08 00:00:00', '/')],)

Expected Output: [('2024-01-01', 1, 1, 0.0), ('2024-01-02', 1, 1, 0.142857), ('2024-01-03', 1, 1, 0.285714), ('2024-01-04', 1, 1, 0.428571), ('2024-01-05', 1, 1, 0.571429), ('2024-01-06', 1, 1, 0.714286), ('2024-01-07', 1, 1, 0.857143), ('2024-01-08', 1, 1, 1.0)]

Explanation: By 2024-01-08, all seven previous days each had one session, so the rolling average is 1.0.

Input: ([],)

Expected Output: []

Explanation: Empty input produces no rows.

Hints

  1. First count sessions by `(shop_id, day)`, then generate every calendar day between the minimum and maximum session date.
  2. For each shop, a sliding window over daily counts lets you compute the rolling average in linear time over the date range.

Part 4: Extract utm_campaign from a URL

Given a simplified list of sessions as tuples `(session_id, url)`, extract the value of the exact query parameter `utm_campaign` from each URL. If the parameter is missing or all of its occurrences are empty, return `None`. If it appears multiple times, return the first non-empty value. Preserve the input order.

Constraints

  • 0 <= len(sessions) <= 200000
  • Each URL is a valid string
  • The parameter name match is case-sensitive and must be exactly `utm_campaign`

Examples

Input: ([('s1', 'https://a.com/?utm_source=google&utm_campaign=summer_sale'), ('s2', 'https://a.com/path?x=1')],)

Expected Output: [('s1', 'summer_sale'), ('s2', None)]

Explanation: The first URL contains `utm_campaign`; the second does not.

Input: ([('s1', 'https://a.com/p?x=1&utm_campaign=winter&y=2#frag')],)

Expected Output: [('s1', 'winter')]

Explanation: The target parameter is present in the query string even though it is not first.

Input: ([('s1', 'https://a.com/?utm_campaign=&utm_source=x&utm_campaign=backup'), ('s2', 'https://a.com/?utm_campaign=')],)

Expected Output: [('s1', 'backup'), ('s2', None)]

Explanation: Use the first non-empty occurrence if the key repeats; an empty-only parameter returns `None`.

Input: ([],)

Expected Output: []

Explanation: Empty input yields empty output.

Hints

  1. Split the URL into its query string before looking for parameters.
  2. Be careful with repeated keys and blank values.
Last updated: Apr 23, 2026

Loading coding console...

PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Compute Jaccard Similarity for Lists - Shopify (medium)
  • Implement URL Shortening Codec - Shopify (medium)
  • Simulate a rover fleet - Shopify (medium)
  • Simulate robot moves on a grid - Shopify (medium)
  • Implement a Capacity-Bounded Cache - Shopify (medium)