Implement n-gram model and select n

Q: Implement n-gram model and select n

This is a Machine Learning interview question from Runway for Machine Learning Engineer roles. View the full question and solution on PracHub.

Q: How do I approach Machine Learning interview questions?

Machine Learning questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master machine learning interviews.

Question

Task: Implement an n-gram Language Model with Training, Sampling, and Model Selection Guidance

Objective

Implement an n-gram language model class with the following methods and discuss model selection and complexity trade-offs:

fit(file_path, n):
- Read and tokenize a text file consistently.
- Build n-gram and (n−1)-gram frequency counts.
- Compute conditional probabilities with smoothing (e.g., add-k or Kneser–Ney).
generate(start_tokens, max_len, sampling_strategy):
- Sample next tokens according to learned probabilities (e.g., multinomial, top-k, temperature) to produce text.

Requirements and Notes

Tokenization must be consistent between training and generation. Include BOS/EOS handling if using sentence generation.
Smoothing options: implement at least add-k; explain and, if possible, implement interpolated Kneser–Ney.
Sampling strategies: support multinomial; add top-k and temperature scaling.
Model selection: discuss how to select the optimal n given data size and domain. Propose validation procedures (train/validation split), metrics (perplexity), regularization and backoff/interpolation.
Analyze time/space complexity and memory footprint for different n values.

Deliverables

Description of class design and data structures.
Clear pseudocode (or concise code sketch) for fit and generate.
Explanation of smoothing methods with formulas.
Explanation of sampling methods.
Strategy for choosing n with validation and perplexity.
Complexity and memory analysis.

Implement n-gram model and select n

Task: Implement an n-gram Language Model with Training, Sampling, and Model Selection Guidance

Objective

Requirements and Notes

Deliverables

Solution

Comments (0)