Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions

Yik Siu Chan, Narutatsu Ri, Yuxin Xiao, Marzyeh Ghassemi

2025

Type

Publication

Forty-second International Conference on Machine Learning

Yuxin Xiao

Yuxin Xiao is a Ph.D. candidate at MIT IDSS. His research focuses on building safe, robust, and trustworthy LLMs and advancing their reasoning and decision-making capabilities for healthcare and other high-stakes applications. Yuxin obtained his M.S. in Machine Learning at Carnegie Mellon University and his B.S. in Computer Science and B.S. in Statistics and Mathematics at the University of Illinois at Urbana-Champaign.

Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions

Yuxin Xiao

Related