Jacob T. wants to read Universal and Transferable Adversarial Attacks on Aligned Language Models by Andy Zou Aug. 25, 2023 Public Universal and Transferable Adversarial Attacks on Aligned Language Models (2023, Arxiv) No rating Because "out-of-the-box" large language models are capable of generating a great deal of objectionable content, … I skimmed the top-level summary when it came out, but it appears well worth a deeper read.