PracHub
QuestionsPremiumLearningGuidesCheatsheetNEWCoaches
|Home/Coding & Algorithms/Apple

Implement bag-of-words similarity search from scratch

Last updated: Mar 29, 2026

Quick Overview

This question evaluates knowledge and implementation skills in text processing, vector-space models, and information retrieval, covering tokenization, term-frequency and TF–IDF weighting, cosine similarity, and ranking for similarity search within the Coding & Algorithms domain.

  • Medium
  • Apple
  • Coding & Algorithms
  • Machine Learning Engineer

Implement bag-of-words similarity search from scratch

Company: Apple

Role: Machine Learning Engineer

Category: Coding & Algorithms

Difficulty: Medium

Interview Round: Technical Screen

Implement a bag-of-words–based text similarity search engine from scratch. Write code that: ( 1) tokenizes text (lowercasing, punctuation handling, Unicode support, and optional stopword removal/stemming—justify your choices), ( 2) builds document vectors using term frequency and supports TF–IDF weighting, ( 3) computes similarity scores (implement cosine similarity; optionally compare with Jaccard), and ( 4) returns the top-k most similar document IDs for a given query along with their scores. Clearly define each function’s purpose and inputs/outputs, and provide a short example demonstrating end-to-end usage. Analyze time and space complexity for indexing and querying, and briefly discuss how you would scale to large corpora (e.g., inverted index, pruning, or approximate search).

Quick Answer: This question evaluates knowledge and implementation skills in text processing, vector-space models, and information retrieval, covering tokenization, term-frequency and TF–IDF weighting, cosine similarity, and ranking for similarity search within the Coding & Algorithms domain.

Related Interview Questions

  • Compute Earliest Bus Arrival - Apple (medium)
  • Find the Extra Edge - Apple (hard)
  • Rotate a Matrix In Place - Apple (medium)
  • Encode and Rebuild a Binary Tree - Apple (hard)
  • Wrap Matching Substrings in Bold Tags - Apple (medium)
Apple logo
Apple
Aug 13, 2025, 12:00 AM
Machine Learning Engineer
Technical Screen
Coding & Algorithms
8
0

Implement a bag-of-words–based text similarity search engine from scratch. Write code that: (

  1. tokenizes text (lowercasing, punctuation handling, Unicode support, and optional stopword removal/stemming—justify your choices), (
  2. builds document vectors using term frequency and supports TF–IDF weighting, (
  3. computes similarity scores (implement cosine similarity; optionally compare with Jaccard), and (
  4. returns the top-k most similar document IDs for a given query along with their scores. Clearly define each function’s purpose and inputs/outputs, and provide a short example demonstrating end-to-end usage. Analyze time and space complexity for indexing and querying, and briefly discuss how you would scale to large corpora (e.g., inverted index, pruning, or approximate search).

Comments (0)

Sign in to leave a comment

Loading comments...

Browse More Questions

More Coding & Algorithms•More Apple•More Machine Learning Engineer•Apple Machine Learning Engineer•Apple Coding & Algorithms•Machine Learning Engineer Coding & Algorithms
PracHub

Master your tech interviews with 7,500+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.