Implement vector–vector multiplication for dense vectors (e.g., dot product). Then, describe how you would optimize the implementation for a parallel architecture (e.g., SIMD, multithreading, memory alignment, cache behavior). Finally, explain how you would perform the same operation when the vectors are sparse, including data structures you would use and how sparsity affects complexity and parallelization.