About Me

I’m a philosopher working on finetuning and AI alignment at Anthropic. My team trains models to be more honest and to have good character traits, and works on developing new finetuning techniques so that our interventions can scale to more capable models. Before this I worked as a research scientist on the policy team at OpenAI, where I worked on AI safety via debate and human baselines for AI performance.

I have a PhD in philosophy from NYU with a thesis on infinite ethics and a BPhil in philosophy from the University of Oxford. I did my undergraduate degree in Philosophy at the University of Dundee, where I started out my academic life as a fine art and philosophy student at the Duncan of Jordanstone art school. My philosophy work has mostly revolved around ethics, decision theory, and formal epistemology.

I’m a member of Giving What We Can and I’ve pledged to donate at least 10% of my lifetime income to charity, but I hope to make that more than 50% if I can. I donate primarily to global poverty charities.

Amanda