Compute sliding window sums by tag
Company: Datadog
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: Medium
Interview Round: Onsite
##### Question
Given a list of datapoints where each datapoint has tags, a timestamp, and a value, write a function that, for a specified tag t and window size k, returns the sums of every consecutive window of size k over the datapoints that contain tag t.
Quick Answer: This question evaluates understanding of sequence processing, sliding-window aggregation, filtering by tag, and efficient state management in tagged time-series or event data, testing algorithmic efficiency and correctness.
You are given a list of datapoints, each with fields: tags (list of strings), ts (integer timestamp), and value (integer). Implement sliding_window_sums_by_tag(datapoints, tag, k) that: (1) selects only datapoints whose tags contain the exact string tag, (2) orders the selected datapoints by ascending ts and preserves original input order when ts is equal, and (3) returns a list of sums of every consecutive window of size k over the ordered values. If fewer than k datapoints match, return an empty list. The input list may be in any order.
Constraints
- 1 <= n <= 200000, where n is the number of datapoints
- Each datapoint is a dict with keys: 'tags' (list[str]), 'ts' (int), 'value' (int)
- Tag strings are non-empty; tags list size 0..50
- Timestamps ts are integers and may repeat
- Values are integers in range [-1e9, 1e9]
- 1 <= k <= n; if the number of matching datapoints m < k, return []
- Order matching datapoints by ascending ts; if ts ties, preserve original input order
- Input datapoints are not guaranteed to be pre-sorted
Solution
def sliding_window_sums_by_tag(datapoints, tag, k):
# Filter datapoints containing the tag, keep original index for stable tie-breaking
filtered = []
for idx, dp in enumerate(datapoints):
tags = dp.get("tags", [])
if isinstance(tags, (list, tuple)) and tag in tags:
ts = dp.get("ts")
val = dp.get("value")
filtered.append((ts, val, idx))
# Edge cases
if k <= 0:
return []
# Sort by timestamp ascending; break ties by original index to preserve input order
filtered.sort(key=lambda x: (x[0], x[2]))
values = [v for _, v, _ in filtered]
m = len(values)
if k > m:
return []
# Sliding window sums
res = []
window_sum = sum(values[:k])
res.append(window_sum)
for i in range(k, m):
window_sum += values[i] - values[i - k]
res.append(window_sum)
return res