How do I approach Coding & Algorithms interview questions?

Coding & Algorithms questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master coding & algorithms interviews.

What difficulty level is this interview question?

This is a Medium difficulty Coding & Algorithms question, commonly asked during Technical Screen rounds at Apple.

What role is this question designed for?

This question is commonly asked for Machine Learning Engineer candidates at Apple during technical interviews.

Implement bag-of-words similarity search from scratch

Last updated: Mar 29, 2026

Quick Overview

This question evaluates knowledge and implementation skills in text processing, vector-space models, and information retrieval, covering tokenization, term-frequency and TF–IDF weighting, cosine similarity, and ranking for similarity search within the Coding & Algorithms domain.

|Home/Coding & Algorithms/Apple

Implement bag-of-words similarity search from scratch

Apple

Aug 13, 2025, 12:00 AM

MediumMachine Learning EngineerTechnical ScreenCoding & Algorithms

Implement a bag-of-words–based text similarity search engine from scratch. Write code that: (

tokenizes text (lowercasing, punctuation handling, Unicode support, and optional stopword removal/stemming—justify your choices), (
builds document vectors using term frequency and supports TF–IDF weighting, (
computes similarity scores (implement cosine similarity; optionally compare with Jaccard), and (
returns the top-k most similar document IDs for a given query along with their scores. Clearly define each function’s purpose and inputs/outputs, and provide a short example demonstrating end-to-end usage. Analyze time and space complexity for indexing and querying, and briefly discuss how you would scale to large corpora (e.g., inverted index, pruning, or approximate search).