Compute sliding window sums by tag
Company: Datadog
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: Medium
Interview Round: Onsite
##### Question
Given a list of datapoints where each datapoint has tags, a timestamp, and a value, write a function that, for a specified tag t and window size k, returns the sums of every consecutive window of size k over the datapoints that contain tag t.
Quick Answer: This question evaluates understanding of sequence processing, sliding-window aggregation, filtering by tag, and efficient state management in tagged time-series or event data, testing algorithmic efficiency and correctness.
You are given a list of datapoints, each with fields: tags (list of strings), ts (integer timestamp), and value (integer). Implement sliding_window_sums_by_tag(datapoints, tag, k) that: (1) selects only datapoints whose tags contain the exact string tag, (2) orders the selected datapoints by ascending ts and preserves original input order when ts is equal, and (3) returns a list of sums of every consecutive window of size k over the ordered values. If fewer than k datapoints match, return an empty list. The input list may be in any order.
Constraints
- 1 <= n <= 200000, where n is the number of datapoints
- Each datapoint is a dict with keys: 'tags' (list[str]), 'ts' (int), 'value' (int)
- Tag strings are non-empty; tags list size 0..50
- Timestamps ts are integers and may repeat
- Values are integers in range [-1e9, 1e9]
- 1 <= k <= n; if the number of matching datapoints m < k, return []
- Order matching datapoints by ascending ts; if ts ties, preserve original input order
- Input datapoints are not guaranteed to be pre-sorted
Solution
def sliding_window_sums_by_tag(datapoints, tag, k):
# Filter datapoints containing the tag, keep original index for stable tie-breaking
filtered = []
for idx, dp in enumerate(datapoints):
tags = dp.get("tags", [])
if isinstance(tags, (list, tuple)) and tag in tags:
ts = dp.get("ts")
val = dp.get("value")
filtered.append((ts, val, idx))
# Edge cases
if k <= 0:
return []
# Sort by timestamp ascending; break ties by original index to preserve input order
filtered.sort(key=lambda x: (x[0], x[2]))
values = [v for _, v, _ in filtered]
m = len(values)
if k > m:
return []
# Sliding window sums
res = []
window_sum = sum(values[:k])
res.append(window_sum)
for i in range(k, m):
window_sum += values[i] - values[i - k]
res.append(window_sum)
return res
Explanation
Filter datapoints to those containing the target tag, then sort by timestamp ascending while preserving input order for equal timestamps. Extract the values and compute the sums of every consecutive window of size k using a sliding window: initialize with the first k values, then for each step add the entering value and subtract the leaving value. If fewer than k datapoints match, return an empty list.
Time complexity: O(m log m), where m is the number of matching datapoints. Space complexity: O(m).
Hints
- First filter datapoints whose tags contain the target tag.
- Sort the filtered list by timestamp ascending; preserve input order for equal timestamps.
- Maintain a running sum for the sliding window and update it in O(1) per step.