PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/ML System Design/Amazon

Design a computer-use agent end-to-end

Last updated: Mar 29, 2026

Quick Overview

This question evaluates competency in designing end-to-end multimodal interactive ML systems, including perception from pixels and accessibility trees, sequential decision-making and planning, action policy design, robustness to UI changes, and safety-aware behavior.

  • medium
  • Amazon
  • ML System Design
  • Machine Learning Engineer

Design a computer-use agent end-to-end

Company: Amazon

Role: Machine Learning Engineer

Category: ML System Design

Difficulty: medium

Interview Round: Onsite

## Scenario You are designing a **computer-use agent** that can complete user tasks on a standard desktop environment by observing the screen and issuing actions (mouse/keyboard). Examples: “Find my last invoice in Gmail and download it”, “Book a flight with these constraints”, “Open a spreadsheet, add a pivot table, and export a PDF”. ## Requirements - **Inputs (observations):** screen pixels (and optionally accessibility tree / DOM if available), plus the user’s natural-language instruction. - **Outputs (actions):** mouse move/click/drag, scroll, key presses, and short text input. - Must support **multi-step planning**, error recovery, and working across many websites/apps. - Provide a design covering the full lifecycle: 1. **Pretraining** (what data, objective, and model components) 2. **Post-training / supervised finetuning** (what demonstrations, labeling strategy) 3. **RL stage** (what reward, what algorithm family, how to stabilize training) 4. **Inference** (latency, context/memory, safety, monitoring) ## Constraints (assume) - Latency target: ~1–2 seconds per action decision. - Must be robust to UI changes. - Must minimize unsafe actions (e.g., sending emails, purchasing) and require confirmation for high-risk steps. Deliver a high-level architecture plus key modeling/training choices, data pipelines, and evaluation/metrics.

Quick Answer: This question evaluates competency in designing end-to-end multimodal interactive ML systems, including perception from pixels and accessibility trees, sequential decision-making and planning, action policy design, robustness to UI changes, and safety-aware behavior.

Related Interview Questions

  • Design systems for global request detection and labeling - Amazon (hard)
  • Debug online worse than offline model performance - Amazon (medium)
  • Approach an ambiguous business problem - Amazon (medium)
  • Explain parallelism and collectives in training - Amazon (medium)
  • Design an LLM quality validation system - Amazon (medium)
Amazon logo
Amazon
Jan 22, 2026, 12:00 AM
Machine Learning Engineer
Onsite
ML System Design
7
0
Loading...

Scenario

You are designing a computer-use agent that can complete user tasks on a standard desktop environment by observing the screen and issuing actions (mouse/keyboard). Examples: “Find my last invoice in Gmail and download it”, “Book a flight with these constraints”, “Open a spreadsheet, add a pivot table, and export a PDF”.

Requirements

  • Inputs (observations): screen pixels (and optionally accessibility tree / DOM if available), plus the user’s natural-language instruction.
  • Outputs (actions): mouse move/click/drag, scroll, key presses, and short text input.
  • Must support multi-step planning , error recovery, and working across many websites/apps.
  • Provide a design covering the full lifecycle:
    1. Pretraining (what data, objective, and model components)
    2. Post-training / supervised finetuning (what demonstrations, labeling strategy)
    3. RL stage (what reward, what algorithm family, how to stabilize training)
    4. Inference (latency, context/memory, safety, monitoring)

Constraints (assume)

  • Latency target: ~1–2 seconds per action decision.
  • Must be robust to UI changes.
  • Must minimize unsafe actions (e.g., sending emails, purchasing) and require confirmation for high-risk steps.

Deliver a high-level architecture plus key modeling/training choices, data pipelines, and evaluation/metrics.

Solution

Show

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More ML System Design•More Amazon•More Machine Learning Engineer•Amazon Machine Learning Engineer•Amazon ML System Design•Machine Learning Engineer ML System Design
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.