How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

What difficulty level is this interview question?

This is a medium difficulty Machine Learning question, commonly asked during Technical Screen rounds at Bytedance.

What role is this question designed for?

This question is commonly asked for Machine Learning Engineer candidates at Bytedance during technical interviews.

Compute Sentence Similarity

Last updated: Apr 6, 2026

Quick Overview

This question evaluates understanding of sentence-level and token-level embedding techniques, text preprocessing, similarity metrics, edge-case handling for empty or unknown tokens, and trade-offs between pretrained sentence encoders and averaged word embeddings.

Bytedance

Jan 7, 2026, 12:00 AM

Machine Learning Engineer

Technical Screen

Machine Learning

Given two text inputs, design and implement a method to compute their semantic similarity.

You may use either of the following approaches:

Encode each sentence into a single embedding using a pretrained sentence encoder, then compute cosine similarity.
Convert each token to a word embedding, average the token embeddings for each sentence, then compute cosine similarity between the two averaged vectors.

Your answer should describe:

How the text is preprocessed
How embeddings are produced
How cosine similarity is computed
How to handle empty text or unknown tokens
The trade-offs between sentence-level encoders and average word embeddings

If coding is requested, provide clear pseudocode or implementation-level steps.

Solution

Show

Submit Your Answer

Loading comments...

Browse More Questions

More Machine Learning•More Bytedance•More Machine Learning Engineer•Bytedance Machine Learning Engineer•Bytedance Machine Learning•Machine Learning Engineer Machine Learning