Match logs to prior queries
Company: Datadog
Role: Software Engineer
Category: Coding & Algorithms
Difficulty: Medium
Interview Round: Technical Screen
Quick Answer: This question evaluates understanding of streaming algorithms, string indexing, and set-based matching, focusing on designing efficient online data structures to match stored queries against incoming logs within the Coding & Algorithms domain.
Constraints
- 1 <= len(lines) <= 200000
- Each line starts with exactly "Q: " or "L: "
- Each query and log contains at least one word after the prefix
- Words are separated by single spaces and contain no spaces themselves
- Matching is case-sensitive
- Let T be the total number of words across all lines; T <= 2,000,000
Solution
from typing import List, Dict
from collections import defaultdict
def match_logs_to_queries(lines: List[str]) -> List[List[str]]:
query_texts: List[str] = []
need: List[int] = [] # number of distinct words required per query
postings: Dict[str, List[int]] = defaultdict(list) # word -> list of query IDs
results: List[List[str]] = []
for line in lines:
if line.startswith("Q:"):
content = line[2:].strip()
qid = len(query_texts)
query_texts.append(content)
words = set(content.split()) # distinct words per query
need.append(len(words))
for w in words:
postings[w].append(qid)
elif line.startswith("L:"):
content = line[2:].strip()
lwords = set(content.split()) # distinct words in the log
seen = defaultdict(int) # qid -> count of matched distinct words
for w in lwords:
for qid in postings.get(w, []):
seen[qid] += 1
matched_ids = [qid for qid, cnt in seen.items() if cnt == need[qid]]
matched_ids.sort() # arrival order
results.append([query_texts[qid] for qid in matched_ids])
else:
# According to constraints, this should not happen.
continue
return results