Explain DPO and construct its training data | ByteDance Interview Question