How do I approach ML System Design interview questions?

ML System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master ml system design interviews.

What difficulty level is this interview question?

This is a medium difficulty ML System Design question, commonly asked during Technical Screen rounds at Cribl.

What role is this question designed for?

This question is commonly asked for Software Engineer candidates at Cribl during technical interviews.

Design an LLM Log Parsing Workflow | Cribl Interview Question

Q: Design an LLM Log Parsing Workflow

This question evaluates skills in ML-enabled log parsing, schema inference, structured data extraction, and designing scalable, reliable production workflows that combine probabilistic LLMs with deterministic parsers and operational engineering.

Design a production workflow that uses an LLM, optionally combined with deterministic parsers, to convert heterogeneous raw log messages into structured JSON fields.

The system must support multiple log formats whose schemas may be very different.

Example 1: access log input:

192.168.1.1 - - [10/Oct/2023:13:55:36 +0000] "GET /index.html HTTP/1.1" 200 1024 "http://example.com/start.html" "Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Trident/6.0)"

Expected structured output:

{
  "src_ip": "192.168.1.1",
  "time": "10/Oct/2023:13:55:36 +0000",
  "http_method": "GET",
  "path": "/index.html",
  "protocol": "HTTP/1.1",
  "response_code": 200,
  "duration": 1024,
  "url": "http://example.com/start.html",
  "userAgent": "Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Trident/6.0)"
}

Example 2: error log input:

[Tue Oct 10 13:55:36 2023] [error] [pid 12345] [client 192.168.1.1:12345] File does not exist: /var/www/html/favicon.ico

This log should produce a different schema, for example fields such as timestamp, level, pid, client_ip, client_port, and error_message.

Discuss the architecture, data flow, schema inference, extraction strategy, validation, scaling, reliability, monitoring, privacy, and how you would evaluate quality.

{ "src_ip": "192.168.1.1", "time": "10/Oct/2023:13:55:36 +0000", "http_method": "GET", "path": "/index.html", "protocol": "HTTP/1.1", "response_code": 200, "duration": 1024, "url": "http://example.com/start.html", "userAgent": "Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; Trident/6.0)" }

Design an LLM Log Parsing Workflow

Quick Overview

Solution

Comments (0)

Design an LLM Log Parsing Workflow

Quick Overview

Solution

Comments (0)