PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/Machine Learning/Uber

Implement Streaming Clustering for Numbers

Last updated: May 3, 2026

Quick Overview

This question evaluates a data scientist's competency in online clustering for numeric streams, testing understanding of streaming algorithms, bounded-memory cluster summaries, online updates, and issues including initialization, convergence behavior, outlier handling, and concept drift.

  • Uber
  • Machine Learning
  • Data Scientist

Implement Streaming Clustering for Numbers

Company: Uber

Role: Data Scientist

Category: Machine Learning

Interview Round: Onsite

You receive a continuous stream of numeric values. Choose an appropriate clustering algorithm and implement it so that each incoming number can be assigned to a cluster while using bounded memory. Clarify and address the following: - The stream is potentially unbounded, so storing all historical values is not allowed. - You may assume the number of clusters `k` is given, or explain how you would choose it. - The algorithm should update cluster summaries online as new values arrive. - The implementation should expose at least two operations: `add(value)` to process a new number and `get_clusters()` to return the current cluster centers or summaries. - Discuss initialization, convergence behavior, outliers, concept drift, and how you would test correctness.

Quick Answer: This question evaluates a data scientist's competency in online clustering for numeric streams, testing understanding of streaming algorithms, bounded-memory cluster summaries, online updates, and issues including initialization, convergence behavior, outlier handling, and concept drift.

Related Interview Questions

  • Evaluate Promotions for Uber Eats Users - Uber (medium)
  • Build cold-start restaurant ratings - Uber (medium)
  • Implement CLIP Contrastive Loss - Uber (medium)
  • Predict driver acceptance - Uber (medium)
  • Explain and test completion-rate gaps - Uber (easy)
Uber logo
Uber
Apr 10, 2026, 12:00 AM
Data Scientist
Onsite
Machine Learning
7
0

You receive a continuous stream of numeric values. Choose an appropriate clustering algorithm and implement it so that each incoming number can be assigned to a cluster while using bounded memory.

Clarify and address the following:

  • The stream is potentially unbounded, so storing all historical values is not allowed.
  • You may assume the number of clusters k is given, or explain how you would choose it.
  • The algorithm should update cluster summaries online as new values arrive.
  • The implementation should expose at least two operations: add(value) to process a new number and get_clusters() to return the current cluster centers or summaries.
  • Discuss initialization, convergence behavior, outliers, concept drift, and how you would test correctness.

Solution

Show

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More Uber•More Data Scientist•Uber Data Scientist•Uber Machine Learning•Data Scientist Machine Learning
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.