Design log-query stream processor

Q: Design log-query stream processor

This question evaluates a candidate's competency in stateful stream processing, pattern-matching and incremental metadata management for real-time log tagging.

Q: How do I approach System Design interview questions?

System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master system design interviews.

Question

Stream Processor: Query Registration and Log Tagging

Context

You are designing a streaming component that ingests a single mixed stream of messages. Each message is either:

A query registration (prefix "Q:"), which defines a string pattern to search for in future log lines.
A log line (prefix "L:"), which must be tagged with the IDs of all queries whose pattern appears in the log text.

Assume a query is a case-sensitive substring pattern (extendable to regex later). Query IDs are assigned incrementally starting at 1, in the order queries arrive. The system outputs an acknowledgment when a query is registered, and for each log it emits the list of matching query IDs (sorted ascending).

Example

Input stream:

Q: error
Q: timeout
L: database timeout after 5s
L: ERROR: connection reset
Q: reset
L: timeout reset error

Expected outputs:

For 1) → ACK 1
For 2) → ACK 2
For 3) → L: database timeout after 5s | matches: [2]
For 4) → L: ERROR: connection reset | matches: [3] (note: case-sensitive, so "ERROR" doesn't match "error")
For 5) → ACK 3
For 6) → L: timeout reset error | matches: [1, 2, 3]

Tasks

Design and implement a function/process that:
- Assigns incremental IDs to queries as they arrive and outputs an acknowledgment (e.g., "ACK <id> ").
- For each log line, emits the log plus the list of matching query IDs, using the matching semantics defined above.
How would you modify your design to efficiently handle a very large volume of data (both queries and logs)?
How would you support deletion of queries in your current implementation, and what inefficiencies need to be addressed?

Assumptions

Matching is case-sensitive substring search; you may note how to extend to case-insensitive or regex.
Queries apply to logs arriving after their registration (no retroactive tagging).
In-order processing of the single input stream is sufficient for correctness (you may discuss scaling beyond one process).

Design log-query stream processor

Quick Overview

Stream Processor: Query Registration and Log Tagging

Context

Example

Tasks

Assumptions

Solution

Comments (0)