PracHub
QuestionsCoachesLearningGuidesInterview Prep
|Home/ML System Design/Amazon

Design a computer-use agent end-to-end

Last updated: Mar 29, 2026

Quick Overview

This question evaluates competency in designing end-to-end multimodal interactive ML systems, including perception from pixels and accessibility trees, sequential decision-making and planning, action policy design, robustness to UI changes, and safety-aware behavior.

  • medium
  • Amazon
  • ML System Design
  • Machine Learning Engineer

Design a computer-use agent end-to-end

Company: Amazon

Role: Machine Learning Engineer

Category: ML System Design

Difficulty: medium

Interview Round: Onsite

## Scenario You are designing a **computer-use agent** that can complete user tasks on a standard desktop environment by observing the screen and issuing actions (mouse/keyboard). Examples: “Find my last invoice in Gmail and download it”, “Book a flight with these constraints”, “Open a spreadsheet, add a pivot table, and export a PDF”. ## Requirements - **Inputs (observations):** screen pixels (and optionally accessibility tree / DOM if available), plus the user’s natural-language instruction. - **Outputs (actions):** mouse move/click/drag, scroll, key presses, and short text input. - Must support **multi-step planning**, error recovery, and working across many websites/apps. - Provide a design covering the full lifecycle: 1. **Pretraining** (what data, objective, and model components) 2. **Post-training / supervised finetuning** (what demonstrations, labeling strategy) 3. **RL stage** (what reward, what algorithm family, how to stabilize training) 4. **Inference** (latency, context/memory, safety, monitoring) ## Constraints (assume) - Latency target: ~1–2 seconds per action decision. - Must be robust to UI changes. - Must minimize unsafe actions (e.g., sending emails, purchasing) and require confirmation for high-risk steps. Deliver a high-level architecture plus key modeling/training choices, data pipelines, and evaluation/metrics.

Quick Answer: This question evaluates competency in designing end-to-end multimodal interactive ML systems, including perception from pixels and accessibility trees, sequential decision-making and planning, action policy design, robustness to UI changes, and safety-aware behavior.

Related Interview Questions

  • Design systems for global request detection and labeling - Amazon (hard)
  • Debug online worse than offline model performance - Amazon (medium)
  • Approach an ambiguous business problem - Amazon (medium)
  • Explain parallelism and collectives in training - Amazon (medium)
  • Design an LLM quality validation system - Amazon (medium)
|Home/ML System Design/Amazon

Design a computer-use agent end-to-end

Amazon logo
Amazon
Jan 22, 2026, 12:00 AM
mediumMachine Learning EngineerOnsiteML System Design
11
0
Loading...

Scenario

You are designing a computer-use agent that can complete user tasks on a standard desktop environment by observing the screen and issuing actions (mouse/keyboard). Examples: “Find my last invoice in Gmail and download it”, “Book a flight with these constraints”, “Open a spreadsheet, add a pivot table, and export a PDF”.

Requirements

  • Inputs (observations): screen pixels (and optionally accessibility tree / DOM if available), plus the user’s natural-language instruction.
  • Outputs (actions): mouse move/click/drag, scroll, key presses, and short text input.
  • Must support multi-step planning , error recovery, and working across many websites/apps.
  • Provide a design covering the full lifecycle:
    1. Pretraining (what data, objective, and model components)
    2. Post-training / supervised finetuning (what demonstrations, labeling strategy)
    3. RL stage (what reward, what algorithm family, how to stabilize training)
    4. Inference (latency, context/memory, safety, monitoring)

Constraints (assume)

  • Latency target: ~1–2 seconds per action decision.
  • Must be robust to UI changes.
  • Must minimize unsafe actions (e.g., sending emails, purchasing) and require confirmation for high-risk steps.

Deliver a high-level architecture plus key modeling/training choices, data pipelines, and evaluation/metrics.

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More ML System Design•More Amazon•More Machine Learning Engineer•Amazon Machine Learning Engineer•Amazon ML System Design•Machine Learning Engineer ML System Design

Your design canvas — auto-saved

PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • AI Coding Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.