How do I practice coding and algorithm questions?

Use PracHub's coding console to write, test, and debug your solutions in Python or JavaScript. View hints, test against sample inputs, and compare with official solutions.

What difficulty level is this coding question?

This is a easy difficulty Coding & Algorithms question, commonly asked during Technical Screen rounds at Shopify.

What role is this question designed for?

This question is commonly asked for Data Engineer candidates at Shopify during technical interviews.

Write SQL for session analytics | Shopify Coding Question

Write SQL for session analytics

Company: Shopify

Role: Data Engineer

Category: Coding & Algorithms

Difficulty: easy

Interview Round: Technical Screen

You are given two tables for an e-commerce product. ## Tables ### 1) `shops` A shop dimension table that contains **duplicate rows for the same `shop_id`**. Columns: - `shop_id` (INT) - `shop_name` (STRING) - `created_at` (TIMESTAMP) Notes: - `shops` may contain multiple rows per `shop_id`. ### 2) `sessions` A session fact table. Columns: - `session_id` (STRING) - `shop_id` (INT) - `started_at` (TIMESTAMP) - `url` (STRING) Assume: - A “daily” metric is based on `DATE(started_at)`. - You may assume a reasonable reporting window exists in the data (or you can derive it from `sessions`). ## Tasks (write one SQL query per task) 1) **Daily total sessions** - Return one row per day with the total number of sessions. - Output: `day`, `session_cnt`. 2) **Daily sessions by shop** - Return one row per day per shop. - Because `shops` has duplicates, you must **deduplicate it before joining**. - Output: `day`, `shop_id`, `shop_name`, `session_cnt`. 3) **Daily sessions by shop with a 7-day rolling average (excluding today)** - For each shop and each calendar day in the reporting window, return: - `day` - `shop_id` - `session_cnt` (0 if there were no sessions that day) - `rolling_7d_avg_excl_today`: average of the previous 7 days’ `session_cnt`, i.e. days `[day-7, day-1]`. - Requirements: - You must **generate a calendar** so that missing days appear. - Days with no sessions must be treated as **0** and included in the rolling average. 4) **Extract information from a URL** - From `sessions.url`, extract the value of the query parameter `utm_campaign`. - Examples: - `https://a.com/?utm_source=google&utm_campaign=summer_sale` → `summer_sale` - `https://a.com/path?x=1` → `NULL` - Output: `session_id`, `utm_campaign`. Use standard SQL (any common dialect is fine), and ensure each query can be run independently.

Quick Answer: This question evaluates SQL proficiency and data engineering competencies, including deduplication, joins, time-series aggregation, rolling-window calculations, calendar generation, and URL query-parameter extraction.

Part 1: Daily Total Sessions

Given a list of session records, each record is a tuple `(session_id, shop_id, started_at, url)`. Count how many sessions happened on each calendar day, where the day is the `YYYY-MM-DD` part of `started_at`. Return one row per day sorted by day ascending.

Constraints

0 <= len(sessions) <= 200000
`started_at` is always a valid timestamp string in `YYYY-MM-DD HH:MM:SS` format
Multiple sessions may occur on the same day

Examples

Input: ([('s1', 1, '2024-01-01 09:00:00', '/a'), ('s2', 1, '2024-01-01 10:15:00', '/b'), ('s3', 2, '2024-01-02 08:00:00', '/c')],)

Expected Output: [('2024-01-01', 2), ('2024-01-02', 1)]

Explanation: Two sessions fall on 2024-01-01 and one on 2024-01-02.

Input: ([('s1', 2, '2024-03-10 23:59:59', '/'), ('s2', 2, '2024-03-09 00:00:00', '/'), ('s3', 3, '2024-03-10 00:00:00', '/')],)

Expected Output: [('2024-03-09', 1), ('2024-03-10', 2)]

Explanation: The input is unsorted, but the result must be sorted by day.

Input: ([('only', 7, '2024-05-05 12:30:00', '/x')],)

Expected Output: [('2024-05-05', 1)]

Explanation: Single-session edge case.

Input: ([],)

Expected Output: []

Explanation: No sessions means no output rows.

Hints

You only need the date part of `started_at`, not the time.
A hash map keyed by day is enough to build the counts efficiently.

Part 2: Daily Sessions by Shop

You are given a shop dimension table and a session table. The `shops` table may contain duplicate rows for the same `shop_id`. Before joining, deduplicate `shops` by keeping the row with the earliest `created_at`; if there is still a tie, keep the lexicographically smallest `shop_name`. Then count sessions per calendar day and `shop_id`, and attach the deduplicated `shop_name`. If a session references a `shop_id` not present in `shops`, use `None` for `shop_name`. Return the result sorted by `day`, then `shop_id`.

Constraints

0 <= len(shops), len(sessions) <= 200000
`created_at` and `started_at` are valid ISO-like timestamps in `YYYY-MM-DD HH:MM:SS` format
Use the deduplication rule exactly as stated before counting or joining

Examples

Input: ([(1, 'Alpha', '2023-01-02 00:00:00'), (1, 'Alpha Old', '2023-01-01 00:00:00'), (2, 'Beta', '2023-01-01 00:00:00')], [('s1', 1, '2024-02-01 09:00:00', '/'), ('s2', 1, '2024-02-01 11:00:00', '/'), ('s3', 2, '2024-02-01 12:00:00', '/'), ('s4', 1, '2024-02-02 10:00:00', '/')])

Expected Output: [('2024-02-01', 1, 'Alpha Old', 2), ('2024-02-01', 2, 'Beta', 1), ('2024-02-02', 1, 'Alpha Old', 1)]

Explanation: For `shop_id = 1`, the earliest `created_at` row is kept, so the attached name is `Alpha Old`.

Input: ([(1, 'Zoo', '2023-01-01 00:00:00'), (1, 'Ace', '2023-01-01 00:00:00')], [('s1', 1, '2024-01-01 00:00:00', '/')])

Expected Output: [('2024-01-01', 1, 'Ace', 1)]

Explanation: The timestamps tie, so the lexicographically smaller name `Ace` wins.

Input: ([(1, 'Only Shop', '2023-01-01 00:00:00')], [('s1', 2, '2024-01-01 10:00:00', '/')])

Expected Output: [('2024-01-01', 2, None, 1)]

Explanation: A session can reference a missing shop; keep the row and use `None` as the name.

Input: ([(1, 'A', '2023-01-01 00:00:00')], [])

Expected Output: []

Explanation: No sessions means no daily shop counts.

Hints

Solve this in two phases: first build one canonical row per `shop_id`, then aggregate sessions by `(day, shop_id)`.
Because the timestamp format is fixed, comparing `created_at` strings lexicographically gives the same order as comparing actual timestamps.

Part 3: Daily Sessions by Shop with 7-Day Rolling Average Excluding Today

Given session records as tuples `(session_id, shop_id, started_at, url)`, derive the reporting window from the minimum session day to the maximum session day. For every distinct `shop_id` in the input and every day in that window, return the day's session count and the 7-day rolling average excluding the current day. Missing days must appear with `session_cnt = 0`. The rolling average is always computed over exactly 7 calendar days, so days before the window starts count as 0. Round the rolling average to 6 decimal places. Return rows sorted by `day`, then `shop_id`.

Constraints

0 <= len(sessions) <= 100000
`started_at` is always a valid timestamp string in `YYYY-MM-DD HH:MM:SS` format
The product of distinct shop count and number of days in the reporting window does not exceed 200000
Round each rolling average to 6 decimal places

Examples

Input: ([('a', 1, '2024-01-01 09:00:00', '/'), ('b', 1, '2024-01-03 10:00:00', '/'), ('c', 1, '2024-01-03 12:00:00', '/'), ('d', 2, '2024-01-02 08:00:00', '/')],)

Expected Output: [('2024-01-01', 1, 1, 0.0), ('2024-01-01', 2, 0, 0.0), ('2024-01-02', 1, 0, 0.142857), ('2024-01-02', 2, 1, 0.0), ('2024-01-03', 1, 2, 0.142857), ('2024-01-03', 2, 0, 0.142857)]

Explanation: The calendar spans 2024-01-01 through 2024-01-03, and missing shop/day combinations must be filled with 0.

Input: ([('x', 5, '2024-05-01 00:00:00', '/')],)

Expected Output: [('2024-05-01', 5, 1, 0.0)]

Explanation: On the first day there are no previous days inside the 7-day lookback, so the average is 0.

Input: ([('s1', 1, '2024-01-01 00:00:00', '/'), ('s2', 1, '2024-01-02 00:00:00', '/'), ('s3', 1, '2024-01-03 00:00:00', '/'), ('s4', 1, '2024-01-04 00:00:00', '/'), ('s5', 1, '2024-01-05 00:00:00', '/'), ('s6', 1, '2024-01-06 00:00:00', '/'), ('s7', 1, '2024-01-07 00:00:00', '/'), ('s8', 1, '2024-01-08 00:00:00', '/')],)

Expected Output: [('2024-01-01', 1, 1, 0.0), ('2024-01-02', 1, 1, 0.142857), ('2024-01-03', 1, 1, 0.285714), ('2024-01-04', 1, 1, 0.428571), ('2024-01-05', 1, 1, 0.571429), ('2024-01-06', 1, 1, 0.714286), ('2024-01-07', 1, 1, 0.857143), ('2024-01-08', 1, 1, 1.0)]

Explanation: By 2024-01-08, all seven previous days each had one session, so the rolling average is 1.0.

Input: ([],)

Expected Output: []

Explanation: Empty input produces no rows.

Hints

First count sessions by `(shop_id, day)`, then generate every calendar day between the minimum and maximum session date.
For each shop, a sliding window over daily counts lets you compute the rolling average in linear time over the date range.

Part 4: Extract utm_campaign from a URL

Given a simplified list of sessions as tuples `(session_id, url)`, extract the value of the exact query parameter `utm_campaign` from each URL. If the parameter is missing or all of its occurrences are empty, return `None`. If it appears multiple times, return the first non-empty value. Preserve the input order.

Constraints

0 <= len(sessions) <= 200000
Each URL is a valid string
The parameter name match is case-sensitive and must be exactly `utm_campaign`

Examples

Input: ([('s1', 'https://a.com/?utm_source=google&utm_campaign=summer_sale'), ('s2', 'https://a.com/path?x=1')],)

Expected Output: [('s1', 'summer_sale'), ('s2', None)]

Explanation: The first URL contains `utm_campaign`; the second does not.

Input: ([('s1', 'https://a.com/p?x=1&utm_campaign=winter&y=2#frag')],)

Expected Output: [('s1', 'winter')]

Explanation: The target parameter is present in the query string even though it is not first.

Input: ([('s1', 'https://a.com/?utm_campaign=&utm_source=x&utm_campaign=backup'), ('s2', 'https://a.com/?utm_campaign=')],)

Expected Output: [('s1', 'backup'), ('s2', None)]

Explanation: Use the first non-empty occurrence if the key repeats; an empty-only parameter returns `None`.

Input: ([],)

Expected Output: []

Explanation: Empty input yields empty output.

Hints

Split the URL into its query string before looking for parameters.
Be careful with repeated keys and blank values.

Write SQL for session analytics

Company: Shopify

Role: Data Engineer

Category: Coding & Algorithms

Difficulty: easy

Interview Round: Technical Screen

Part 1: Daily Total Sessions

Constraints

0 <= len(sessions) <= 200000
`started_at` is always a valid timestamp string in `YYYY-MM-DD HH:MM:SS` format
Multiple sessions may occur on the same day

Examples

Input: ([('s1', 1, '2024-01-01 09:00:00', '/a'), ('s2', 1, '2024-01-01 10:15:00', '/b'), ('s3', 2, '2024-01-02 08:00:00', '/c')],)

Expected Output: [('2024-01-01', 2), ('2024-01-02', 1)]

Explanation: Two sessions fall on 2024-01-01 and one on 2024-01-02.

Input: ([('s1', 2, '2024-03-10 23:59:59', '/'), ('s2', 2, '2024-03-09 00:00:00', '/'), ('s3', 3, '2024-03-10 00:00:00', '/')],)

Expected Output: [('2024-03-09', 1), ('2024-03-10', 2)]

Explanation: The input is unsorted, but the result must be sorted by day.

Input: ([('only', 7, '2024-05-05 12:30:00', '/x')],)

Expected Output: [('2024-05-05', 1)]

Explanation: Single-session edge case.

Input: ([],)

Expected Output: []

Explanation: No sessions means no output rows.

Hints

You only need the date part of `started_at`, not the time.
A hash map keyed by day is enough to build the counts efficiently.

Part 2: Daily Sessions by Shop

Constraints

0 <= len(shops), len(sessions) <= 200000
`created_at` and `started_at` are valid ISO-like timestamps in `YYYY-MM-DD HH:MM:SS` format
Use the deduplication rule exactly as stated before counting or joining

Examples

Expected Output: [('2024-02-01', 1, 'Alpha Old', 2), ('2024-02-01', 2, 'Beta', 1), ('2024-02-02', 1, 'Alpha Old', 1)]

Explanation: For `shop_id = 1`, the earliest `created_at` row is kept, so the attached name is `Alpha Old`.

Input: ([(1, 'Zoo', '2023-01-01 00:00:00'), (1, 'Ace', '2023-01-01 00:00:00')], [('s1', 1, '2024-01-01 00:00:00', '/')])

Expected Output: [('2024-01-01', 1, 'Ace', 1)]

Explanation: The timestamps tie, so the lexicographically smaller name `Ace` wins.

Input: ([(1, 'Only Shop', '2023-01-01 00:00:00')], [('s1', 2, '2024-01-01 10:00:00', '/')])

Expected Output: [('2024-01-01', 2, None, 1)]

Explanation: A session can reference a missing shop; keep the row and use `None` as the name.

Input: ([(1, 'A', '2023-01-01 00:00:00')], [])

Expected Output: []

Explanation: No sessions means no daily shop counts.

Hints

Solve this in two phases: first build one canonical row per `shop_id`, then aggregate sessions by `(day, shop_id)`.
Because the timestamp format is fixed, comparing `created_at` strings lexicographically gives the same order as comparing actual timestamps.

Part 3: Daily Sessions by Shop with 7-Day Rolling Average Excluding Today

Constraints

0 <= len(sessions) <= 100000
`started_at` is always a valid timestamp string in `YYYY-MM-DD HH:MM:SS` format
The product of distinct shop count and number of days in the reporting window does not exceed 200000
Round each rolling average to 6 decimal places

Examples

Input: ([('a', 1, '2024-01-01 09:00:00', '/'), ('b', 1, '2024-01-03 10:00:00', '/'), ('c', 1, '2024-01-03 12:00:00', '/'), ('d', 2, '2024-01-02 08:00:00', '/')],)

Expected Output: [('2024-01-01', 1, 1, 0.0), ('2024-01-01', 2, 0, 0.0), ('2024-01-02', 1, 0, 0.142857), ('2024-01-02', 2, 1, 0.0), ('2024-01-03', 1, 2, 0.142857), ('2024-01-03', 2, 0, 0.142857)]

Explanation: The calendar spans 2024-01-01 through 2024-01-03, and missing shop/day combinations must be filled with 0.

Input: ([('x', 5, '2024-05-01 00:00:00', '/')],)

Expected Output: [('2024-05-01', 5, 1, 0.0)]

Explanation: On the first day there are no previous days inside the 7-day lookback, so the average is 0.

Explanation: By 2024-01-08, all seven previous days each had one session, so the rolling average is 1.0.

Input: ([],)

Expected Output: []

Explanation: Empty input produces no rows.

Hints

First count sessions by `(shop_id, day)`, then generate every calendar day between the minimum and maximum session date.
For each shop, a sliding window over daily counts lets you compute the rolling average in linear time over the date range.

Part 4: Extract utm_campaign from a URL

Constraints

0 <= len(sessions) <= 200000
Each URL is a valid string
The parameter name match is case-sensitive and must be exactly `utm_campaign`

Examples

Input: ([('s1', 'https://a.com/?utm_source=google&utm_campaign=summer_sale'), ('s2', 'https://a.com/path?x=1')],)

Expected Output: [('s1', 'summer_sale'), ('s2', None)]

Explanation: The first URL contains `utm_campaign`; the second does not.

Input: ([('s1', 'https://a.com/p?x=1&utm_campaign=winter&y=2#frag')],)

Expected Output: [('s1', 'winter')]

Explanation: The target parameter is present in the query string even though it is not first.

Input: ([('s1', 'https://a.com/?utm_campaign=&utm_source=x&utm_campaign=backup'), ('s2', 'https://a.com/?utm_campaign=')],)

Expected Output: [('s1', 'backup'), ('s2', None)]

Explanation: Use the first non-empty occurrence if the key repeats; an empty-only parameter returns `None`.

Input: ([],)

Expected Output: []

Explanation: Empty input yields empty output.

Hints

Split the URL into its query string before looking for parameters.
Be careful with repeated keys and blank values.

Quick Overview

Quick Overview