Towards understanding sycophancy in language models

Published:

Recommended citation: Mrinank Sharma, Meg Tong, Tomasz Korbak, David Duvenaud, Amanda Askell, Samuel R Bowman, Newton Cheng, Esin Durmus, Zac Hatfield-Dodds, Scott R Johnston, Shauna Kravec, Timothy Maxwell, Sam McCandlish, Kamal Ndousse, Oliver Rausch, Nicholas Schiefer, Da Yan, Miranda Zhang, Ethan Perez. Towards understanding sycophancy in language models. arXiv preprint arXiv:2310.13548 (2023) https://arxiv.org/abs/2310.13548

Summary:

Leave a Comment