Optimize attention for long sequences | Amazon Interview Question