Explain DPO and construct its training data | ByteDance