Machine Learning discussion
Answer conceptually (no code). Assume you are training or adapting a multimodal large model (e.g., text + image, or text + audio).
-
What is the biggest challenge
when training multimodal foundation models? Pick 1–2 top challenges and go deep.
-
Compare a
“reasoning-focused LLM”
vs a
standard instruction/chat LLM
:
-
What is different in objectives/training data?
-
What changes in inference (e.g., tool use, planning, test-time compute)?
-
How do you evaluate reasoning quality and reliability?
Be ready to discuss practical trade-offs: data, alignment, evaluation, cost/latency, and safety.