Implement vectorized NumPy code for: (a) computing pairwise cosine similarity between two real-valued matrices X (shape n×d) and Y (shape m×d) without explicit Python loops; (b) computing a numerically stable softmax for a 2D array along the last axis; (c) explaining how broadcasting works if X has shape (n, 1, d) and Y has shape (1, m, d). Analyze time and space complexity, and discuss pitfalls such as unintended broadcasting, dtype issues, and memory usage.