This question evaluates a candidate's ability to design end-to-end image classification systems and manage noisy image datasets, testing competencies in data validation, labeling strategy, pipeline reproducibility, and diagnostic analysis.
You are asked to build an image classification model (single-label, multi-class) for a product team. The image dataset is known to be dirty (e.g., corrupted files, wrong labels, duplicates, irrelevant images, inconsistent formats). Compared with text classification, image inputs often require additional preprocessing and validation.