Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned August 2022
Beyond the imitation game: Quantifying and extrapolating the capabilities of language models June 2021