Implement bag-of-words similarity search from scratch

Q: Implement bag-of-words similarity search from scratch

This is a Coding & Algorithms interview question from Apple for Machine Learning Engineer roles. View the full question and solution on PracHub.

Q: How do I approach Coding & Algorithms interview questions?

Coding & Algorithms questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master coding & algorithms interviews.

Question

Implement a bag-of-words–based text similarity search engine from scratch. Write code that: (

tokenizes text (lowercasing, punctuation handling, Unicode support, and optional stopword removal/stemming—justify your choices), (
builds document vectors using term frequency and supports TF–IDF weighting, (
computes similarity scores (implement cosine similarity; optionally compare with Jaccard), and (
returns the top-k most similar document IDs for a given query along with their scores. Clearly define each function’s purpose and inputs/outputs, and provide a short example demonstrating end-to-end usage. Analyze time and space complexity for indexing and querying, and briefly discuss how you would scale to large corpora (e.g., inverted index, pruning, or approximate search).

Implement bag-of-words similarity search from scratch

Comments (0)