We propose the HyperDPO method, a conditioned one-shot multi-objective fine-tuning framework that generalizes DPO to the multi-objective setting, profiles the Pareto front through one-shot training, and offers flexible post-training control over trade-offs.
We propose the HyperDPO method, a hypernetwork-based multi-objective fine-tuning frame-work that generalizes DPO to the multi-objective setting, profiles the Pareto front through one-shot training, and offers flexible post-training control over trade-offs.