PracHub
QuestionsPremiumCoachesLearningGuidesInterview Prep
|Home/Machine Learning/Runway

Implement n-gram model and select n

Last updated: Jun 2, 2026

Quick Overview

This question evaluates competency in probabilistic language modeling and practical engineering of n-gram systems, covering n-gram construction, smoothing methods, sampling strategies, model selection and complexity analysis within the Machine Learning domain.

  • hard
  • Runway
  • Machine Learning
  • Machine Learning Engineer

Implement n-gram model and select n

Company: Runway

Role: Machine Learning Engineer

Category: Machine Learning

Difficulty: hard

Interview Round: Technical Screen

Implement an n-gram language model class with fit and generate methods. The fit(file_path, n) method should read a text file, tokenize consistently, build n-gram and (n− 1)-gram frequency counts, and compute conditional probabilities with smoothing (e.g., add-k or Kneser–Ney). The generate(start_tokens, max_len, sampling_strategy) method should sample next tokens according to learned probabilities (e.g., multinomial, top-k, or temperature) to produce text. Discuss how to select the optimal n given data size and domain: propose validation procedures (e.g., train/validation split), metrics (perplexity), regularization/backoff or interpolation, and analyze the time/space complexity and memory footprint for different n values.

Quick Answer: This question evaluates competency in probabilistic language modeling and practical engineering of n-gram systems, covering n-gram construction, smoothing methods, sampling strategies, model selection and complexity analysis within the Machine Learning domain.

Runway logo
Runway
Aug 8, 2025, 12:00 AM
Machine Learning Engineer
Technical Screen
Machine Learning
14
0

Task: Implement an n-gram Language Model with Training, Sampling, and Model Selection Guidance

Objective

Implement an n-gram language model class with the following methods and discuss model selection and complexity trade-offs:

  • fit(file_path, n):
    • Read and tokenize a text file consistently.
    • Build n-gram and (n−1)-gram frequency counts.
    • Compute conditional probabilities with smoothing (e.g., add-k or Kneser–Ney).
  • generate(start_tokens, max_len, sampling_strategy):
    • Sample next tokens according to learned probabilities (e.g., multinomial, top-k, temperature) to produce text.

Requirements and Notes

  • Tokenization must be consistent between training and generation. Include BOS/EOS handling if using sentence generation.
  • Smoothing options: implement at least add-k; explain and, if possible, implement interpolated Kneser–Ney.
  • Sampling strategies: support multinomial; add top-k and temperature scaling.
  • Model selection: discuss how to select the optimal n given data size and domain. Propose validation procedures (train/validation split), metrics (perplexity), regularization and backoff/interpolation.
  • Analyze time/space complexity and memory footprint for different n values.

Deliverables

  • Description of class design and data structures.
  • Clear pseudocode (or concise code sketch) for fit and generate.
  • Explanation of smoothing methods with formulas.
  • Explanation of sampling methods.
  • Strategy for choosing n with validation and perplexity.
  • Complexity and memory analysis.

Solution

Show

Submit Your Answer to Earn 20XP

Sign in to leave a comment

Loading comments...

Browse More Questions

More Machine Learning•More Runway•More Machine Learning Engineer•Runway Machine Learning Engineer•Runway Machine Learning•Machine Learning Engineer Machine Learning
PracHub

Master your tech interviews with 8,000+ real questions from top companies.

Product

  • Questions
  • Learning Tracks
  • Interview Guides
  • Resources
  • Premium
  • For Universities
  • Student Access

Browse

  • By Company
  • By Role
  • By Category
  • Topic Hubs
  • SQL Questions
  • Compare Platforms
  • Discord Community

Support

  • support@prachub.com
  • (916) 541-4762

Legal

  • Privacy Policy
  • Terms of Service
  • About Us

© 2026 PracHub. All rights reserved.