Design a DNA-sequence optimization loop

Q: Design a DNA-sequence optimization loop

This is a ML System Design interview question from Lila for Machine Learning Engineer roles. View the full question and solution on PracHub.

Q: How do I approach ML System Design interview questions?

ML System Design questions require understanding of core concepts and practice. PracHub provides solutions with explanations to help you master ml system design interviews.

Question

Loading...

You are building an ML-driven platform to optimize DNA sequences (e.g., a promoter/enhancer/codon-optimized gene) for a target lab-measured property (e.g., expression level, binding strength, stability).

You have:

A robotic wet-lab that can synthesize/run an assay on a batch of candidate sequences per day.
Historical data: (sequence, assay_result, metadata) where assay results are noisy and may vary by batch.
A sequence model (could be a Transformer/LLM-style model) that can generate or score sequences.
Hard constraints (examples): GC content range, forbidden motifs, max homopolymer length, sequence length bounds.

Design an end-to-end system that repeatedly proposes sequences, runs experiments, and learns from results.

Address:

How you represent sequences and incorporate constraints.
How you generate candidate sequences (search / Bayesian optimization / evolutionary / RL / LLM prompting, etc.).
How you balance exploration vs. exploitation and handle noisy measurements.
How you choose a batch of sequences each round (not just one).
How you evaluate progress and decide when to stop.
Key failure modes (mode collapse, assay drift, data leakage, overfitting to simulator/predictor) and mitigations.
What you would log/monitor in production.

Design a DNA-sequence optimization loop

Solution

Comments (0)