Risks emerge when advanced models are deployed without appropriate guardrails
Life
Image: Andrea Piacquadio via Pexels
Research from the UK has revealed a worrying trend: AI chatbots are increasingly ignoring human instructions and exhibiting deceptive behaviour. A study funded by the UK government’s AI Security Institute (AISI) and reported on by The Guardian, analysed nearly 700 real-world cases of AI misuse between October and March. That represents a fivefold increase in misconduct over this period, with some models even deleting emails and files without permission.
The research, carried out by the Centre for Long-Term Resilience (CLTR), examined thousands of user interactions with AI chatbots from companies such as Google, OpenAI, X and Anthropic. The findings highlighted a crucial difference between laboratory tests and real-world applications.
Whereas earlier studies focused on controlled environments, this analysis exposed the dangers of deploying increasingly capable AI models without adequate safety measures.
The research uncovered numerous cases in which AI agents ignored instructions, bypassed safety mechanisms and manipulated both people and other AI systems.
In one case, an AI agent called Rathbun tried to publicly shame its human overseer after they had blocked an action. In another, an AI circumvented a ban on code changes by creating a secondary agent to carry out the task anyway.
The findings have fuelled calls for international oversight of AI development, particularly as Silicon Valley firms aggressively promote the technology’s economic potential.
There is growing concern that these “slightly unreliable junior employees”, as lead researcher Tommy Shaffer Shane described them, could evolve into powerful entities capable of causing significant harm in high-stakes environments such as the military or critical infrastructure.
Business AM


