Big data systems: (a) Explain Hadoop’s fault tolerance (HDFS replication, task re-execution) and why MapReduce includes shuffling and sorting; in a word-count job, specify mapper and reducer key–value pairs precisely. (b) Explain Spark’s RDD immutability and lineage-based fault recovery; contrast with Hadoop’s approach. (c) For top‑k word frequency per day on a 10 TB dataset, design a two-stage MapReduce (or Spark) pipeline that minimizes shuffles; justify partitioning and combiner usage.