Implement attention and Transformer with backward pass | Tesla Interview Question