PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep

Quick Overview

This question evaluates understanding of streaming algorithms, string indexing, and set-based matching, focusing on designing efficient online data structures to match stored queries against incoming logs within the Coding & Algorithms domain.

  • Medium
  • Datadog
  • Coding & Algorithms
  • Software Engineer

Match logs to prior queries

Company: Datadog

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: Medium

Interview Round: Technical Screen

##### Question You receive a stream of strings, each beginning with either "Q:" (query) or "L:" (log). A query consists of space-separated words and should be indexed when it arrives. For every subsequent log line, output all previously seen queries for which every word in the query appears at least once in the log (case-sensitive, word boundaries by space). Design and implement an efficient algorithm/data structure to support this online matching and output.

Quick Answer: This question evaluates understanding of streaming algorithms, string indexing, and set-based matching, focusing on designing efficient online data structures to match stored queries against incoming logs within the Coding & Algorithms domain.

You receive a list of lines, each beginning with either "Q:" (query) or "L:" (log). A query line has space-separated words and is indexed when it appears. For every subsequent log line, output all previously seen queries for which every distinct word in the query appears at least once in the log. Matching is case-sensitive and words are split by spaces. Return, for each log line in input order, the list of matching queries in the order the queries were received. Duplicate query strings are treated as distinct entries.

Constraints

  • 1 <= len(lines) <= 200000
  • Each line starts with exactly "Q: " or "L: "
  • Each query and log contains at least one word after the prefix
  • Words are separated by single spaces and contain no spaces themselves
  • Matching is case-sensitive
  • Let T be the total number of words across all lines; T <= 2,000,000

Solution

from typing import List, Dict
from collections import defaultdict

def match_logs_to_queries(lines: List[str]) -> List[List[str]]:
    query_texts: List[str] = []
    need: List[int] = []  # number of distinct words required per query
    postings: Dict[str, List[int]] = defaultdict(list)  # word -> list of query IDs
    results: List[List[str]] = []

    for line in lines:
        if line.startswith("Q:"):
            content = line[2:].strip()
            qid = len(query_texts)
            query_texts.append(content)
            words = set(content.split())  # distinct words per query
            need.append(len(words))
            for w in words:
                postings[w].append(qid)
        elif line.startswith("L:"):
            content = line[2:].strip()
            lwords = set(content.split())  # distinct words in the log
            seen = defaultdict(int)  # qid -> count of matched distinct words
            for w in lwords:
                for qid in postings.get(w, []):
                    seen[qid] += 1
            matched_ids = [qid for qid, cnt in seen.items() if cnt == need[qid]]
            matched_ids.sort()  # arrival order
            results.append([query_texts[qid] for qid in matched_ids])
        else:
            # According to constraints, this should not happen.
            continue

    return results
Explanation
Maintain an inverted index from word to the list of query IDs that contain it, and track for each query the number of distinct words it requires. When a log arrives, convert its words to a set to ignore repeats, then for each word, increment a counter for every query that includes that word. A query matches the log exactly when the number of its distinct words seen equals its required count. Output the matched queries in arrival order by sorting their IDs.

Time complexity: Building the index over all queries is O(total distinct query words). For a log with U distinct words, time is O(sum over its words of postings size) to count, plus O(K log K) to output K matches in arrival order.. Space complexity: O(Q + P), where Q is the number of queries (to store texts and counts) and P is the total size of the inverted index postings across all words..

Hints

  1. Index each query by its set of distinct words and store the required count per query.
  2. Build an inverted index: word -> list of query IDs containing that word.
  3. For each log, take its distinct words and accumulate counts per candidate query ID using the inverted index; a query matches if its seen count equals its required count.
Last updated: May 17, 2026

Loading coding console...

PracHub

Master your tech interviews with 8,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Implement a Snowflake Query Client - Datadog (medium)
  • Implement Prefix Match Filter - Datadog (hard)
  • Build span trees from unordered trace spans - Datadog (medium)
  • Implement buffered file writer with concurrency support - Datadog (easy)
  • Implement write with internal buffer - Datadog (Medium)