Design Tic-Tac-Toe and QPS data structures
Company: Databricks
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: medium
Interview Round: Onsite
You are given two independent coding problems that focus on data structure and API design.
---
## Problem 1: Generalized Tic-Tac-Toe Game with Simple AI
Design a Tic-Tac-Toe game that supports a generalized board size and a customizable win condition.
Implement a class `TicTacToe` with the following behavior:
- The game is played by two players, labeled `1` and `2`.
- The board has `rows` rows and `cols` columns.
- A player wins if they have **K** of their marks in a line, where a line can be:
- A horizontal line
- A vertical line
- A main diagonal (top-left to bottom-right)
- An anti-diagonal (top-right to bottom-left)
### API
```text
TicTacToe(int rows, int cols, int K)
```
- Initializes the game board with the given dimensions and win condition `K`.
- All cells are initially empty.
```text
int move(int row, int col, int player)
```
- `row`, `col` are 0-indexed coordinates on the board.
- `player` is either `1` or `2`.
- Places the player's mark at `(row, col)`.
- Returns:
- `0` if no one has won after this move,
- `1` if player 1 wins as a result of this move,
- `2` if player 2 wins as a result of this move,
- `-1` if the move is invalid (out of bounds or the cell is already occupied).
You should design the data structures so that each `move` call is efficient even when the board is large.
**Constraints (you may assume):**
- `1 <= rows, cols <= 1000`
- `1 <= K <= max(rows, cols)`
- Total number of moves ≤ `rows * cols`.
### Follow-up: Random-Move AI
Extend the design with a very simple AI that chooses a random legal move for a given player.
You are given a helper function:
```text
int randInt(int low, int high)
```
which returns a uniform random integer in the inclusive range `[low, high]`.
Add a method:
```text
pair<int, int> getNextMove(int player)
```
- Returns a pair `(row, col)` corresponding to an **empty** cell where `player` can play next.
- The AI should pick **uniformly at random** among all currently empty cells.
- If there is no legal move (the board is full or the game is already over), you may return a special value, such as `(-1, -1)`.
Design this so that `getNextMove` runs efficiently, even on a large board with many moves already played.
---
## Problem 2: Query QPS from KV Store Request Logs
A key-value (KV) store receives a large number of requests. Each request is logged with a timestamp (in seconds). You want to support efficient queries about the **queries per second (QPS)** over arbitrary time ranges.
You are given an initially empty log. Implement a data structure that supports the following operations:
```text
void record(long timestamp)
```
- Called once for each request handled by the KV store.
- `timestamp` is an integer representing seconds since some fixed epoch.
- Timestamps are not guaranteed to be contiguous but you may assume they are non-decreasing (each new call has `timestamp >=` previous `timestamp`).
```text
double getQPS(long startTime, long endTime)
```
- Returns the **average QPS** in the inclusive time interval `[startTime, endTime]`.
- Formally, if `count` requests have timestamps `t` such that `startTime <= t <= endTime`, then:
`QPS = count / (endTime - startTime + 1)`
- You should support many `getQPS` queries efficiently after recording a large volume of data.
**Constraints (you may assume):**
- Up to `N = 10^6` calls to `record`.
- Up to `Q = 10^6` calls to `getQPS`.
- Timestamps fit in a 64-bit signed integer.
### Follow-up 1: Efficient Arbitrary-Time Queries
The naive implementation might scan all timestamps within `[startTime, endTime]` for each query, which is too slow when `Q` is large.
Design and describe a more efficient approach that:
- Uses reasonable additional memory.
- Supports `getQPS(startTime, endTime)` in **sublinear** time in `N` (for example, `O(log N)` per query).
Your design should be implementable using common data structures (arrays, lists, hash maps, trees, etc.).
### Follow-up 2: Trade Accuracy for Less Memory
Now suppose memory is tight and you are allowed to **sacrifice precision** to save space. Instead of storing every individual timestamp, you can approximate the QPS.
Design a modified scheme that:
- Uses significantly less memory than the exact solution.
- Returns an approximate QPS for any `[startTime, endTime]` query.
- Clearly states what is being approximated (for example, grouping multiple seconds into a time range / bucket) and what kind of error might be introduced.
You should:
- Describe the data you would store (e.g., buckets or ranges instead of individual timestamps).
- Explain how `getQPS` will be computed from this aggregated data.
- Reason about the trade-off between memory usage, accuracy, and query performance.
Quick Answer: This question evaluates data-structure and API design skills, covering state management and efficient update/query algorithms for a generalized Tic-Tac-Toe with uniform-random move selection, as well as time-series aggregation and streaming-log techniques for computing QPS from non-decreasing timestamps.
Part 1: Generalized Tic-Tac-Toe Move Engine
Implement the core move engine for a **generalized Tic-Tac-Toe** game played on an arbitrary board with a configurable winning run length.
## Task
Write a function `solution(rows, cols, k, moves)` that replays a sequence of move attempts on an empty `rows` x `cols` board and returns the result of each attempt.
## Inputs
- `rows`, `cols` — the board dimensions. Cells are addressed by **0-indexed** coordinates: valid rows are `0 .. rows - 1` and valid columns are `0 .. cols - 1`.
- `k` — the number of **consecutive marks** a player must line up to win.
- `moves` — a list of move attempts processed **in order**. Each attempt is a triple `(row, col, player)`, where `player` is intended to be `1` or `2`.
## Output
Return a list of integers, **one per move attempt**, in the same order as `moves`. For each attempt, append:
- `0` — the move is valid and placed, but **no one has won yet**.
- `1` or `2` — the move is valid and causes **that player to win** on this move.
- `-1` — the move is **invalid** (see below) and is **not** placed on the board.
## Rules
Process each attempt `(row, col, player)` one at a time:
1. **Validate the move.** The result is `-1` if **any** of these holds:
- the **game has already ended** (a previous move won), or
- `player` is **not** `1` or `2`, or
- the cell is **out of bounds** (i.e. `row` or `col` falls outside the board), or
- the cell is **already occupied** by an earlier mark.
An invalid move leaves the board unchanged and the game state untouched.
2. **Place the mark.** For a valid move, record `player`'s mark at `(row, col)`.
3. **Check for a win.** The player wins if placing this mark forms a run of **at least `k` consecutive marks of theirs** along any of the four axes through the just-placed cell:
- **horizontal** (left–right),
- **vertical** (up–down),
- **diagonal**, or
- **anti-diagonal**.
The run is counted in both directions from the new cell. If the run reaches length `k` on any axis, the player **wins** and the game ends (all later attempts then return `-1`). Otherwise the result is `0`.
## Notes
- You do **not** need to enforce alternating turns — the same player may move twice in a row, and an out-of-turn move is still valid as long as it passes the checks above.
- Once a player has won, the game is over: every remaining attempt yields `-1`.
## Constraints
- `1 <= rows, cols <= 1000`
- `1 <= k <= max(rows, cols)`
- `0 <= len(moves) <= rows * cols`
- Each `player` value is intended to be `1` or `2`; any other value makes that move invalid (`-1`).
Constraints
- 1 <= rows, cols <= 1000
- 1 <= k <= max(rows, cols)
- 0 <= len(moves) <= rows * cols
- Each player value is intended to be 1 or 2, but invalid values should return -1 for that move
Examples
Input: (3, 3, 3, [(0, 0, 1), (1, 0, 2), (0, 1, 1), (1, 1, 2), (0, 2, 1)])
Expected Output: [0, 0, 0, 0, 1]
Explanation: Player 1 wins horizontally on the top row with the last move.
Input: (3, 4, 3, [(0, 1, 2), (0, 0, 1), (1, 1, 1), (2, 2, 1)])
Expected Output: [0, 0, 0, 1]
Explanation: Player 1 forms a main diagonal of length 3.
Input: (2, 2, 2, [(0, 0, 1), (0, 0, 2), (2, 1, 1)])
Expected Output: [0, -1, -1]
Explanation: The second move uses an occupied cell, and the third is out of bounds.
Input: (2, 3, 1, [(1, 2, 2), (0, 0, 1)])
Expected Output: [2, -1]
Explanation: With k = 1, the first valid move wins immediately, so later moves are invalid.
Hints
- Only the most recent move can create a new win, so you never need to rescan the whole board.
- For each move, check four direction pairs: horizontal, vertical, main diagonal, and anti-diagonal.
Part 2: Random Legal Move Selector for Generalized Tic-Tac-Toe
Implement `solution(rows, cols, k, operations)` for **generalized Tic-Tac-Toe** on a `rows × cols` board, where a player wins by getting **`k` of their marks in a row**. This part adds a deterministic "AI move suggestion" query on top of the normal move logic.
You process the `operations` list **in order** and return a **list of results**, one entry per operation.
## Board and Players
- The board has `rows` rows and `cols` columns; cells are indexed by `(row, col)` with `0 <= row < rows` and `0 <= col < cols`.
- There are two players, identified by the integers **`1`** and **`2`**.
- A player wins immediately when, after placing a mark, they have **`k` consecutive marks** along any of the four axes: **horizontal**, **vertical**, or either **diagonal**. Once a player wins, the **game is over** and every subsequent operation is rejected.
## Operations
Each operation is a tuple. There are two kinds:
### `('move', row, col, player)`
Attempt to place `player`'s mark at cell `(row, col)`.
- The move is **rejected** if any of the following holds — append `-1`:
- the game is already over,
- `player` is not `1` or `2`,
- `(row, col)` is out of bounds,
- the cell `(row, col)` is already occupied.
- Otherwise the mark is placed, and:
- if this move makes `player` win (reaches `k` in a row), append **`player`** and mark the game as over;
- otherwise append **`0`**.
### `('ai', player, rank)`
Ask for a candidate empty cell, **without placing any mark** (board state is unchanged).
- Let `E` be the number of currently empty cells. Conceptually, the AI picks uniformly at random among empty cells; `rank` is supplied as the index of that draw so the result is **deterministic and testable**.
- If the query is valid, return the **`rank`-th empty cell in row-major order** among the currently empty cells, as a `(row, col)` pair. Row-major order means cells are ordered by `row` first, then `col`. `rank` is **0-indexed** (so `rank = 0` is the first empty cell).
- The query **fails** — append `(-1, -1)` — if any of the following holds:
- the game is already over,
- `player` is not `1` or `2`,
- there are no empty cells (`E == 0`),
- `rank` is out of range, i.e. not `0 <= rank < E`.
### Any other operation type
If an operation's type is neither `'move'` nor `'ai'`, append **`-1`** for it.
## Return Value
Return the list of per-operation results described above:
- a `move` yields `player`, `0`, or `-1`;
- an `ai` query yields a `(row, col)` pair or `(-1, -1)`.
## Constraints
- `1 <= rows, cols`
- `1 <= rows * cols <= 200000`
- `1 <= k <= max(rows, cols)`
- `0 <= len(operations) <= 200000`
Constraints
- 1 <= rows, cols
- 1 <= rows * cols <= 200000
- 1 <= k <= max(rows, cols)
- 0 <= len(operations) <= 200000
Examples
Input: (2, 2, 2, [('ai', 1, 2), ('move', 0, 1, 1), ('ai', 2, 1)])
Expected Output: [(1, 0), 0, (1, 0)]
Explanation: Initially the 3rd empty cell in row-major order is `(1, 0)`. After occupying `(0, 1)`, the 2nd empty cell is still `(1, 0)`.
Input: (1, 3, 1, [('ai', 1, 1), ('move', 0, 2, 2), ('ai', 1, 0), ('move', 0, 1, 1)])
Expected Output: [(0, 1), 2, (-1, -1), -1]
Explanation: The move by player 2 wins immediately because `k = 1`, so later AI and move operations are invalid.
Input: (1, 2, 2, [('move', 0, 0, 1), ('move', 0, 0, 2), ('move', 0, 1, 2), ('ai', 1, 0)])
Expected Output: [0, -1, 0, (-1, -1)]
Explanation: The board becomes full after the third operation, so the AI has no legal move.
Input: (2, 3, 3, [('move', 0, 0, 1), ('ai', 2, 5), ('ai', 2, 4)])
Expected Output: [0, (-1, -1), (1, 2)]
Explanation: After one move there are 5 empty cells, so rank 5 is invalid but rank 4 points to the last empty cell in row-major order.
Hints
- If `rank` tells you which empty cell to choose, the AI query becomes a k-th available-position query.
- A Fenwick tree or segment tree can track which cells are still empty in row-major order.
Part 3: Online Exact QPS Tracker
Process a stream of interleaved log and query operations for a key-value store and return the answer to each query.
Implement:
```python
def solution(operations):
...
```
## Input
`operations` is a list where each entry is one of two operations:
- **`('record', timestamp)`** — log one request that arrived at integer `timestamp`. Record timestamps appear in **non-decreasing** order across the stream.
- **`('getQPS', startTime, endTime)`** — query the **average QPS** (queries per second) over the **inclusive** time window `[startTime, endTime]`, using only the requests recorded **so far** (i.e. by operations that appear before this query in the list).
The two kinds of operations are interleaved in any order.
## Output
Return a list of floats containing one answer **for each `getQPS` operation only**, in the order those queries appear. `record` operations produce no output.
## How to compute a query answer
For a `getQPS(startTime, endTime)` query:
1. Let **`count`** = the number of requests recorded so far whose `timestamp` lies in the inclusive interval `[startTime, endTime]`.
2. The interval spans `endTime - startTime + 1` seconds (inclusive of both endpoints).
3. The average QPS is `count / (endTime - startTime + 1)`.
Multiple requests may share the same timestamp; each one counts individually.
## Edge cases
- If `startTime > endTime`, the answer for that query is `0.0`.
- If no requests have been recorded yet (or no recorded request falls inside the window), the answer is `0.0`.
## Examples
**Example 1**
```
operations = [('record', 1), ('record', 1), ('record', 3), ('getQPS', 1, 3), ('getQPS', 2, 2)]
returns [1.0, 0.0]
```
- `getQPS(1, 3)`: 3 requests in `[1, 3]` over a 3-second span → `3 / 3 = 1.0`.
- `getQPS(2, 2)`: no request at timestamp 2 → `0 / 1 = 0.0`.
**Example 2**
```
operations = [('record', 10), ('record', 13), ('record', 13), ('getQPS', 10, 13), ('getQPS', 11, 12)]
returns [0.75, 0.0]
```
- `getQPS(10, 13)`: 3 requests in `[10, 13]` over a 4-second span → `3 / 4 = 0.75`.
- `getQPS(11, 12)`: no request in `[11, 12]` → `0.0`.
**Example 3**
```
operations = [('getQPS', 5, 7), ('record', 6), ('getQPS', 6, 6)]
returns [0.0, 1.0]
```
- `getQPS(5, 7)`: nothing recorded yet → `0.0`.
- `getQPS(6, 6)`: 1 request at timestamp 6 over a 1-second span → `1.0`.
**Example 4**
```
operations = [('record', 2), ('getQPS', 5, 3), ('getQPS', 0, 2)]
returns [0.0, 0.3333333333333333]
```
- `getQPS(5, 3)`: `startTime > endTime` → `0.0`.
- `getQPS(0, 2)`: 1 request in `[0, 2]` over a 3-second span → `1 / 3 = 0.3333333333333333`.
## Constraints
- Up to `10^6` operations.
- Record timestamps are non-decreasing.
- Timestamps fit in a signed 64-bit integer.
Constraints
- Up to 10^6 operations
- Record timestamps are non-decreasing
- Timestamps fit in a signed 64-bit integer
Examples
Input: ([('record', 1), ('record', 1), ('record', 3), ('getQPS', 1, 3), ('getQPS', 2, 2)])
Expected Output: [1.0, 0.0]
Explanation: There are 3 requests in `[1, 3]` over 3 seconds, and none in `[2, 2]`.
Input: ([('record', 10), ('record', 13), ('record', 13), ('getQPS', 10, 13), ('getQPS', 11, 12)])
Expected Output: [0.75, 0.0]
Explanation: The first query sees 3 requests over 4 seconds.
Input: ([('getQPS', 5, 7), ('record', 6), ('getQPS', 6, 6)])
Expected Output: [0.0, 1.0]
Explanation: The first query is on an empty log. After recording one request at time 6, QPS over `[6, 6]` is 1.0.
Input: ([('record', 2), ('getQPS', 5, 3), ('getQPS', 0, 2)])
Expected Output: [0.0, 0.3333333333333333]
Explanation: An invalid time range returns 0.0. The second query has 1 request over 3 seconds.
Hints
- Many consecutive `record` calls may share the same timestamp, so storing one count per distinct time can save space.
- Use binary search on the distinct timestamps to count how many requests fall inside a query range.
Part 4: Fast Arbitrary-Range QPS Queries from Sorted Logs
Answer many **average-QPS (queries-per-second) range queries** over a static, sorted request log — each query in **sublinear time**, without scanning the whole log.
## What to implement
```
solution(timestamps, queries)
```
- **`timestamps`** — a list of request times, sorted in **non-decreasing** order (duplicates allowed). This log is fixed; it never changes between queries.
- **`queries`** — a list of `(startTime, endTime)` pairs.
Return a **list of floats**, one per query, in the same order as the input queries.
## What each query asks
For a query `(startTime, endTime)`, compute the **average QPS over the inclusive interval `[startTime, endTime]`**, defined as:
```
(number of timestamps t with startTime <= t <= endTime)
-----------------------------------------------------------
endTime - startTime + 1
```
- The **numerator** is how many log entries fall inside the closed interval — both endpoints are included, and duplicate timestamps each count.
- The **denominator** is the number of integer time units the interval spans. The `+ 1` makes the interval inclusive of both endpoints (for example, `[5, 5]` spans **1** unit, not 0).
**Example:** with three entries at time `5`, the query `(5, 5)` returns `3 / (5 - 5 + 1) = 3.0`.
## Rules and edge cases
- **Inverted range:** if `startTime > endTime`, return `0.0` for that query.
- **Empty log:** if `timestamps` is empty, every query returns `0.0`.
- **No matches:** if no timestamp falls in `[startTime, endTime]`, the numerator is `0`, so the result is `0.0`.
- **Negative times are valid.** Timestamps and query bounds may be negative; only the ordering matters.
- Each query is independent — answering one does not affect any other.
## Constraints
- `0 <= len(timestamps) <= 10^6`
- `0 <= len(queries) <= 10^6`
- `timestamps` is sorted in non-decreasing order.
- Every timestamp fits in a signed 64-bit integer.
Because the log is pre-sorted, aim to answer each query in **O(log N)** rather than scanning all `N` timestamps.
Constraints
- 0 <= len(timestamps) <= 10^6
- 0 <= len(queries) <= 10^6
- Timestamps are sorted in non-decreasing order
- Timestamps fit in a signed 64-bit integer
Examples
Input: ([1, 1, 2, 4, 10], [(1, 2), (3, 10)])
Expected Output: [1.5, 0.25]
Explanation: The first query counts 3 requests over 2 seconds; the second counts 2 requests over 8 seconds.
Input: ([], [(0, 0), (5, 7)])
Expected Output: [0.0, 0.0]
Explanation: An empty log yields zero QPS for every query.
Input: ([5, 5, 5], [(5, 5), (4, 6), (6, 10)])
Expected Output: [3.0, 1.0, 0.0]
Explanation: All requests happen at time 5.
Input: ([-3, -1, 0, 2], [(-2, 0), (-5, -4), (3, 1)])
Expected Output: [0.6666666666666666, 0.0, 0.0]
Explanation: The first range contains 2 requests over 3 seconds; the last query is an invalid interval.
Hints
- In a sorted array, the number of values inside a range can be found with two binary searches.
- You only need the first index >= startTime and the first index > endTime.
Part 5: Approximate QPS with Fixed-Width Time Buckets
Estimate the **approximate QPS (queries per second)** over time windows using a low-memory bucketed counting scheme — without storing every individual request timestamp.
Implement:
```python
def solution(bucket_size, timestamps, queries):
...
```
## Idea
Instead of keeping every timestamp, group the timeline into **fixed-width buckets** of `bucket_size` consecutive seconds and store only the **request count** per bucket. A timestamp `t` belongs to bucket `b = t // bucket_size` (floor division), so bucket `b` covers the inclusive integer-second range:
```
[b * bucket_size, b * bucket_size + bucket_size - 1]
```
To answer a query, assume requests are **uniformly distributed within each bucket**, so a bucket contributes a fraction of its count proportional to how much of it the query window covers.
## Input
- `bucket_size` (int): the width of each bucket, in seconds.
- `timestamps` (list of ints): the second at which each request occurred. Timestamps may repeat and may be **negative**; floor division still applies.
- `queries` (list of `[startTime, endTime]` pairs): each query asks for the approximate QPS over the **inclusive** time window `[startTime, endTime]` (both endpoints included).
## Output
Return a **list of floats**, one per query (in the same order), where each value is the approximate QPS for that query's window.
## How to answer one query `[start, end]`
The window spans `duration = end - start + 1` inclusive seconds.
1. **Approximate request count.** Sum, over every bucket that the window touches:
```
bucket_count * (overlap_len / bucket_size)
```
where `overlap_len` is the **inclusive integer overlap** between the window `[start, end]` and the bucket's range, i.e.
```
overlap_len = max(0, min(end, bucket_end) - max(start, bucket_start) + 1)
```
- A bucket **fully inside** the window has `overlap_len == bucket_size`, so it contributes its entire count.
- The two boundary buckets (those containing `start` and `end`) are typically only **partially** overlapped, so they contribute a prorated fraction of their count.
2. **Approximate QPS.** Divide the approximate request count by the window length:
```
qps = approx_count / duration
```
## Rules / edge cases
- **Invalid window:** if `startTime > endTime`, the result for that query is `0.0`.
- An **empty** `timestamps` list (or a window touching no requests) yields `0.0` for the affected queries.
- Negative timestamps and negative query endpoints are valid and handled by floor division.
## Examples
- `bucket_size = 10`, `timestamps = [0, 1, 9, 10, 11, 19]`, `queries = [[0, 9], [5, 14]]` → `[0.3, 0.3]`.
- `bucket_size = 4`, `timestamps = [0, 0, 0, 3, 4, 7]`, `queries = [[1, 4]]` → `[0.875]`. Bucket `0 = [0,3]` has count `4` and overlaps `[1,3]` (length `3`), contributing `4 * 3/4 = 3`; bucket `1 = [4,7]` has count `2` and overlaps `[4,4]` (length `1`), contributing `2 * 1/4 = 0.5`; total `3.5`, divided by `duration = 4`, gives `0.875`.
- `bucket_size = 5`, `timestamps = []`, `queries = [[0, 4], [3, 2]]` → `[0.0, 0.0]` (no data; second query is invalid).
- `bucket_size = 3`, `timestamps = [-3, -2, 0, 2, 3]`, `queries = [[-3, -1], [0, 5]]` → `[0.6666666666666666, 0.5]` (negative timestamps via floor division).
## Constraints
- `1 <= bucket_size <= 10^9`
- `0 <= len(timestamps) <= 200000`
- `0 <= len(queries) <= 200000`
- Timestamps fit in a signed 64-bit integer.
Constraints
- 1 <= bucket_size <= 10^9
- 0 <= len(timestamps) <= 200000
- 0 <= len(queries) <= 200000
- Timestamps fit in a signed 64-bit integer
Examples
Input: (10, [0, 1, 9, 10, 11, 19], [(0, 9), (5, 14)])
Expected Output: [0.3, 0.3]
Explanation: Each 10-second bucket contains 3 requests. The second query uses 5 seconds from each of two buckets.
Input: (5, [], [(0, 4), (3, 2)])
Expected Output: [0.0, 0.0]
Explanation: No timestamps means zero approximate QPS, and an invalid interval also returns 0.0.
Input: (4, [0, 0, 0, 3, 4, 7], [(1, 4)])
Expected Output: [0.875]
Explanation: Bucket `[0, 3]` contributes 3/4 of its 4 requests and bucket `[4, 7]` contributes 1/4 of its 2 requests.
Input: (3, [-3, -2, 0, 2, 3], [(-3, -1), (0, 5)])
Expected Output: [0.6666666666666666, 0.5]
Explanation: The first query is entirely inside one bucket; the second spans two full buckets.
Hints
- Map each timestamp `t` to bucket `t // bucket_size` and count how many records fall in each bucket.
- For a partial bucket, multiply that bucket's count by `overlap_length / bucket_size`.