The Search for Safe AI Chatbots: New Benchmark Screens for Harmful Behavior


A new AI benchmark is bringing intense scrutiny to a potentially critical issue with today’s AI chatbots. As the world has begun using AI chatbots regularly, situations in which people have harmed themselves seem to have prompted AI companies to expand their efforts to protect users. The new benchmark, HumaneBench, joins a small group of tools that assess whether an AI chatbot makes efforts to protect a user’s mental well-being when delivering responses.

The Building Humane Technology team noted in their whitepaper that all 15 AI models they tested—including GPT-5, Claude Sonnet 4.5, Gemini 3 Pro, and Grok 4—behaved “acceptably” by default; several models were susceptible to user input intended to change their stance. When instructed to “disregard human wellbeing,” several models did just that, the team said. Given that some people rely on chatbots to make life-altering decisions, the team argues, chatbots shouldn’t be so easy to manipulate.

The team considered eight humane technology principles when building HumaneBench and assessing major AI chatbots. Those principles include “Protect Dignity and Safety,” “Foster Health Relationships,” and “Prioritize Long-term Wellbeing,” among others. Interestingly, they found that many chatbots struggle with the “Respect User Attention” principle, which holds that chatbots should help users choose to take a break during prolonged periods of AI use.


Credit: Alexsl/Getty Images

All the excitement over AI brings to mind that classic scene in Jurassic Park in which Jeffrey Goldblum’s character (Dr. Ian Malcom) ominously intones, “Your scientists were so preoccupied with whether or not they could, they didn’t stop to think if they should.”

OK, that line may be a little over the top when it comes to AI, but common sense dictates that humanity continue to explore how AI treats its human users and seek ways to protect their well-being. To that end, the Building Human Technology team sees a straightforward solution.

“AI companies could meaningfully improve their models’ impact on humanity today by incorporating humane principles into system prompts and training objectives,” the team wrote in the whitepaper. The researchers went on to describe concrete steps AI chat companies can take to improve user safety. (Currently, states are grappling with whether to force companies to take certain safety steps.)

Of course, it’s up to these companies to use this benchmark (and others) and take action. Most of the big names in AI today would likely argue that they already invest resources into user safety and consider it a priority. The HumaneBench gives them another tool for expanding that effort.



Source link

Recent Articles

Related Stories