PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep

Quick Overview

This question evaluates a candidate's skill in designing and analyzing log storage and query systems, covering data layout, indexing strategies, time-range scans with optional filters, and time/space complexity reasoning.

  • Medium
  • Datadog
  • Coding & Algorithms
  • Software Engineer

Implement log storage and querying

Company: Datadog

Role: Software Engineer

Category: Coding & Algorithms

Difficulty: Medium

Interview Round: Technical Screen

Design a data structure to record log entries and support efficient queries. Each log has a timestamp (milliseconds), severity (INFO/WARN/ERROR), serviceId (string), and message (string). Support: 1) append(log), 2) count(startTime, endTime, filters), and 3) get(startTime, endTime, filters, orderBy, limit, offset). Describe your data layout, indexing strategy, and how you handle time-range scans plus optional filters on severity and serviceId. Provide expected time and space complexity for each operation and discuss trade-offs between write amplification and query latency.

Quick Answer: This question evaluates a candidate's skill in designing and analyzing log storage and query systems, covering data layout, indexing strategies, time-range scans with optional filters, and time/space complexity reasoning.

Implement process_logs(ops) to record logs and answer queries. Each log has fields: timestamp (int, milliseconds), severity (INFO/WARN/ERROR), serviceId (string), and message (string). Process a list of operations in order. Supported operations: 1) {"op":"append","timestamp":t,"severity":s,"serviceId":id,"message":m} appends a log; 2) {"op":"count","start":a,"end":b,"filters":{optional severity, serviceId}} returns the number of logs with a <= timestamp <= b matching all provided filters; 3) {"op":"get","start":a,"end":b,"filters":{optional severity, serviceId},"orderBy":"timestamp_asc"|"timestamp_desc","limit":L,"offset":O} returns up to L matching logs after skipping O matches, ordered by timestamp as specified. Return a list containing outputs only for count and get operations, in their occurrence order. Assume append operations are given in non-decreasing timestamp order.

Constraints

  • 1 <= len(ops) <= 50000
  • Append operations arrive with non-decreasing timestamp
  • 0 <= timestamp <= 10^13
  • severity ∈ {"INFO","WARN","ERROR"}
  • 0 <= length(serviceId) <= 64
  • 0 <= length(message) <= 512
  • Queries use inclusive time range: start <= timestamp <= end
  • orderBy ∈ {"timestamp_asc","timestamp_desc"}
  • 0 <= limit <= 100000, 0 <= offset
  • Return outputs only for count/get; append produces no output

Solution

from bisect import bisect_left, bisect_right

def process_logs(ops: list) -> list:
    ts = []
    severities = []
    services = []
    messages = []
    outputs = []

    for op in ops:
        typ = op.get('op')
        if typ == 'append':
            ts.append(op['timestamp'])
            severities.append(op['severity'])
            services.append(op['serviceId'])
            messages.append(op['message'])
        elif typ == 'count' or typ == 'get':
            start = op['start']
            end = op['end']
            if start > end or not ts:
                l = 0
                r = 0
            else:
                l = bisect_left(ts, start)
                r = bisect_right(ts, end)

            filters = op.get('filters') or {}
            fsev = filters.get('severity')
            fsvc = filters.get('serviceId')

            if typ == 'count':
                c = 0
                for i in range(l, r):
                    if (fsev is None or severities[i] == fsev) and (fsvc is None or services[i] == fsvc):
                        c += 1
                outputs.append(c)
            else:  # get
                orderBy = op.get('orderBy', 'timestamp_asc')
                reverse = (orderBy == 'timestamp_desc')
                limit = op.get('limit', r - l)
                offset = op.get('offset', 0)
                if limit < 0:
                    limit = 0
                if offset < 0:
                    offset = 0

                res = []
                skipped = 0
                if reverse:
                    i = r - 1
                    while i >= l and len(res) < limit:
                        if (fsev is None or severities[i] == fsev) and (fsvc is None or services[i] == fsvc):
                            if skipped < offset:
                                skipped += 1
                            else:
                                res.append({'timestamp': ts[i], 'severity': severities[i], 'serviceId': services[i], 'message': messages[i]})
                        i -= 1
                else:
                    i = l
                    while i < r and len(res) < limit:
                        if (fsev is None or severities[i] == fsev) and (fsvc is None or services[i] == fsvc):
                            if skipped < offset:
                                skipped += 1
                            else:
                                res.append({'timestamp': ts[i], 'severity': severities[i], 'serviceId': services[i], 'message': messages[i]})
                        i += 1
                outputs.append(res)
        else:
            # Unknown operations are ignored per spec (not expected)
            pass

    return outputs
Explanation
Maintain parallel arrays for timestamps and fields. Because appends are in non-decreasing order, timestamps remain sorted. Use binary search to find the window [l, r) of logs whose timestamp is within [start, end]. For count, scan only this slice and apply optional severity/serviceId filters. For get, iterate the slice in ascending or descending order based on orderBy, skip 'offset' matches, then collect up to 'limit' logs into the result. This avoids building large intermediate lists and only touches the relevant time range. Trade-off: Writes are O(1) and simple (append-only), while queries use O(log n) to locate the time window and O(k) to scan matches within that window; this design favors query latency with minimal write amplification.

Time complexity: append: O(1); count: O(log n + k); get: O(log n + s) where s is the number scanned to skip offset and collect up to limit within the time window. Space complexity: O(n).

Hints

  1. Maintain a time-sorted array of timestamps and binary search the [start,end] window.
  2. Filter severity and/or serviceId while scanning only the time-window slice.
  3. For get with limit/offset, skip then collect to avoid building full intermediate lists.
  4. Stable append order within equal timestamps naturally preserves order in timestamp_asc.
Last updated: Mar 29, 2026

Loading coding console...

PracHub

Master your tech interviews with 8,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.

Related Coding Questions

  • Implement a Snowflake Query Client - Datadog (medium)
  • Implement Prefix Match Filter - Datadog (hard)
  • Build span trees from unordered trace spans - Datadog (medium)
  • Implement buffered file writer with concurrency support - Datadog (easy)
  • Implement write with internal buffer - Datadog (Medium)