PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/System Design/Box

Diagnose failures via SSH and large logs

Last updated: Mar 29, 2026

Quick Overview

This question evaluates on-call troubleshooting and observability competencies, including command-line log triage on Linux, handling rotated or compressed logs, isolating error time windows, and high-level design of centralized logging and search architectures.

  • medium
  • Box
  • System Design
  • Software Engineer

Diagnose failures via SSH and large logs

Company: Box

Role: Software Engineer

Category: System Design

Difficulty: medium

Interview Round: Onsite

A production service is failing. You have only SSH access to a host and the log file(s) are very large. How would you efficiently locate relevant errors or time ranges, and what specific commands, filters, or strategies would you use? If this issue recurs, how would you design a centralized logging and search solution, and what trade-offs would you consider?

Quick Answer: This question evaluates on-call troubleshooting and observability competencies, including command-line log triage on Linux, handling rotated or compressed logs, isolating error time windows, and high-level design of centralized logging and search architectures.

Related Interview Questions

  • Identify and fix deadlock in locked code - Box (medium)
  • Explain and diagram your past system architecture - Box (hard)
  • Implement a leaky-bucket rate limiter - Box (hard)
Box logo
Box
Aug 1, 2025, 12:00 AM
Software Engineer
Onsite
System Design
6
0

Troubleshooting Large Logs Over SSH and Designing Centralized Logging

Context

You are on-call for a production service that is failing. You have SSH access to a Linux host, but the application log files are very large (and may be rotated/compressed). You need to quickly locate relevant errors and determine the problematic time window. If this problem recurs, you should outline a centralized logging/search solution and discuss trade-offs.

Assume:

  • You can use common CLI tools available on most Linux hosts (e.g., journalctl, grep/awk/sed, less, zgrep, lsof).
  • Logs may be written to systemd-journald or to files in /var/log or an app directory, with rotation (e.g., .gz files).
  • Network bandwidth is limited; avoid transferring large files.

Tasks

  1. On-host triage: Describe how you would efficiently find relevant errors/time ranges in very large logs. Specify concrete commands, filters, and strategies (including for rotated/compressed logs, multiline stack traces, and time filtering).
  2. If this issue recurs: Propose a centralized logging and search architecture. Include ingestion, processing, storage, and query/visualization. Discuss trade-offs among common choices (e.g., Elasticsearch/OpenSearch, Loki, ClickHouse, object storage + query engines, managed services), including cost, scale, performance, and operability.

Solution

Show

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More System Design•More Box•More Software Engineer•Box Software Engineer•Box System Design•Software Engineer System Design
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.