PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/ML System Design/Google

Design anomaly detection and response platform

Last updated: Mar 29, 2026

Quick Overview

This question evaluates a candidate's competency in designing end-to-end ML-driven systems that handle large-scale, low-latency OS snapshot ingestion, binary anomaly detection, decisioning and action orchestration with multi-tenant security, auditability, and failure rollback.

  • hard
  • Google
  • ML System Design
  • Software Engineer

Design anomaly detection and response platform

Company: Google

Role: Software Engineer

Category: ML System Design

Difficulty: hard

Interview Round: Technical Screen

A cloud service ingests operating system snapshots and runs an AI-based anomaly detector that outputs Normal or Abnormal. For Abnormal, the system can shut down or quarantine the machine and send warning emails; clients can query the history of warnings. Design the detection service: data collection, model serving, decisioning, action orchestration, audit and history storage, access controls, and failure rollback. Discuss latency, throughput, false positives, and safeguards for automated actions.

Quick Answer: This question evaluates a candidate's competency in designing end-to-end ML-driven systems that handle large-scale, low-latency OS snapshot ingestion, binary anomaly detection, decisioning and action orchestration with multi-tenant security, auditability, and failure rollback.

Related Interview Questions

  • Design an app-store app recommendation system - Google (medium)
  • Design a chatbot over structured and unstructured data - Google (medium)
  • Design a fraud detection system - Google (medium)
  • Choose Fast or Cheap Models - Google
  • Design ML system for self-driving perception - Google (medium)
Google logo
Google
Sep 6, 2025, 12:00 AM
Software Engineer
Technical Screen
ML System Design
2
0

Design an AI-Driven OS Snapshot Anomaly Detection Service

Context

You are designing a cloud service that ingests operating system (OS) snapshots from client machines, runs an AI-based anomaly detector that outputs either Normal or Abnormal, and triggers automated actions for Abnormal cases. Clients must be able to query the history of warnings and actions.

Assume a multi-tenant environment with per-tenant isolation and compliance needs. Snapshots may arrive as a continuous stream or batched. End-to-end detection should be near real-time for operational usefulness.

Requirements

  • Functional
    1. Data collection: ingest OS snapshots reliably and at scale.
    2. Model serving: run an AI anomaly detector (binary classification).
    3. Decisioning: translate model outputs into actions with policies.
    4. Action orchestration: perform shutdown/quarantine, send emails, with retries and rollbacks.
    5. Audit and history storage: immutable log of detections, actions, and operator overrides; queryable by clients.
    6. Access controls: strong authn/authz for agents, services, and users; per-tenant data isolation.
    7. Failure rollback: safe defaults, kill switches, undo for actions, and disaster recovery.
  • Non-functional
    • Low end-to-end latency (seconds) for high-priority events.
    • High throughput and elasticity.
    • Low false positive rate with safeguards against destructive automated actions.
    • Observability, compliance-grade audit trails, and multi-region resilience.

Deliverable

Design the detection service covering: data collection, model serving, decisioning, action orchestration, audit and history storage, access controls, and failure rollback. Discuss latency targets, throughput planning, false positives trade-offs, and safeguards for automated actions.

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More ML System Design•More Google•More Software Engineer•Google Software Engineer•Google ML System Design•Software Engineer ML System Design
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.