This question evaluates understanding of training and adapting multimodal large models and comparative reasoning about model objectives, data strategies, inference behavior, evaluation, alignment, cost, latency, and safety, testing competencies in model design and systems-level trade-offs.
Answer conceptually (no code). Assume you are training or adapting a multimodal large model (e.g., text + image, or text + audio).
Be ready to discuss practical trade-offs: data, alignment, evaluation, cost/latency, and safety.