PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/System Design/Datadog

Design log-query stream processor

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a candidate's competency in stateful stream processing, pattern-matching and incremental metadata management for real-time log tagging.

  • hard
  • Datadog
  • System Design
  • Software Engineer

Design log-query stream processor

Company: Datadog

Role: Software Engineer

Category: System Design

Difficulty: hard

Interview Round: Technical Screen

##### Question Design and implement a function that processes a mixed input stream of queries (prefix "Q:") and logs (prefix "L:"), assigns incremental IDs to new queries, outputs an acknowledgment for each query, and tags each log line with the IDs of matching queries as shown in the example. How would you modify your design to efficiently handle a very large volume of data? How would you support deletion of queries in your current implementation, and what inefficiencies need to be addressed?

Quick Answer: This question evaluates a candidate's competency in stateful stream processing, pattern-matching and incremental metadata management for real-time log tagging.

Datadog logo
Datadog
Jul 29, 2025, 8:05 AM
Software Engineer
Technical Screen
System Design
50
0

Stream Processor: Query Registration and Log Tagging

Context

You are designing a streaming component that ingests a single mixed stream of messages. Each message is either:

  • A query registration (prefix "Q:"), which defines a string pattern to search for in future log lines.
  • A log line (prefix "L:"), which must be tagged with the IDs of all queries whose pattern appears in the log text.

Assume a query is a case-sensitive substring pattern (extendable to regex later). Query IDs are assigned incrementally starting at 1, in the order queries arrive. The system outputs an acknowledgment when a query is registered, and for each log it emits the list of matching query IDs (sorted ascending).

Example

Input stream:

  1. Q: error
  2. Q: timeout
  3. L: database timeout after 5s
  4. L: ERROR: connection reset
  5. Q: reset
  6. L: timeout reset error

Expected outputs:

  • For 1) → ACK 1
  • For 2) → ACK 2
  • For 3) → L: database timeout after 5s | matches: [2]
  • For 4) → L: ERROR: connection reset | matches: [3] (note: case-sensitive, so "ERROR" doesn't match "error")
  • For 5) → ACK 3
  • For 6) → L: timeout reset error | matches: [1, 2, 3]

Tasks

  1. Design and implement a function/process that:
    • Assigns incremental IDs to queries as they arrive and outputs an acknowledgment (e.g., "ACK <id> ").
    • For each log line, emits the log plus the list of matching query IDs, using the matching semantics defined above.
  2. How would you modify your design to efficiently handle a very large volume of data (both queries and logs)?
  3. How would you support deletion of queries in your current implementation, and what inefficiencies need to be addressed?

Assumptions

  • Matching is case-sensitive substring search; you may note how to extend to case-insensitive or regex.
  • Queries apply to logs arriving after their registration (no retroactive tagging).
  • In-order processing of the single input stream is sufficient for correctness (you may discuss scaling beyond one process).

Solution

Show

Submit Your Answer

Sign in to leave a comment

Loading comments...

Browse More Questions

More System Design•More Datadog•More Software Engineer•Datadog Software Engineer•Datadog System Design•Software Engineer System Design
PracHub

Master your tech interviews with 8,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.