[EMNLP'24 (Main)] DRPO(Dynamic Rewarding with Prompt Optimization) is a tuning-free approach for self-alignment. DRPO leverages a search-based optimization framework that allows LLMs to iteratively self-improve and design the best alignment instructions without the need for additional training.
[EMNLP'24 (Main)] DRPO(Dynamic Rewarding with Prompt Optimization) is a tuning-free approach for self-alignment. DRPO leverages a search-based optimization framework that allows LLMs to iteratively self-improve and design the best alignment instructions without the need for additional training.