This question evaluates understanding of the MapReduce programming model, distributed batch-processing concepts, and performance optimization for parallel CPU and network utilization within the system design domain.

You are designing a large-scale batch processing job (e.g., feature extraction, log aggregation, joins) over a distributed file system. The job must scale across many machines while keeping both CPU and network well utilized.
Login required