Optimize k to Minimize Distributed Runtime
Context
You are splitting a dataset of size n across k identical machines.
-
Per-machine compute (with even partitioning): O(n/k)
-
Merge cost of the k partial results: O(k^2)
-
Other overheads are negligible
Total runtime: T(k) = O(n/k + k^2)
Question
Which value of k approximately minimizes the total runtime? Pick ONE option:
-
A) around 10
-
B) around log₂ n
-
C) around square root of n
-
D) around n/2