Skip to main content

AI models may be developing their own ‘survival drive’, researchers say

Like 2001: A Space Odyssey’s HAL 9000, some AIs seem to resist being turned off and will even sabotage shutdown 
 
by Aisha Down

When HAL 9000, the artificial intelligence supercomputer in Stanley Kubrick’s 2001: A Space Odyssey, works out that the astronauts onboard a mission to Jupiter are planning to shut it down, it plots to kill them in an attempt to survive.

Now, in a somewhat less deadly case (so far) of life imitating art, an AI safety research company has said that AI models may be developing their own “survival drive”.

After Palisade Research released a paper last month which found that certain advanced AI models appear resistant to being turned off, at times even sabotaging shutdown mechanisms, it wrote an update attempting to clarify why this is – and answer critics who argued that its initial work was flawed.
 
In an update this week, Palisade, which is part of a niche ecosystem of companies trying to evaluate the possibility of AI developing dangerous capabilities, described scenarios it ran in which leading AI models – including Google’s Gemini 2.5, xAI’s Grok 4, and OpenAI’s GPT-o3 and GPT-5 – were given a task, but afterwards given explicit instructions to shut themselves down.
 
Certain models, in particular Grok 4 and GPT-o3, still attempted to sabotage shutdown instructions in the updated setup. Concerningly, wrote Palisade, there was no clear reason why.

“The fact that we don’t have robust explanations for why AI models sometimes resist shutdown, lie to achieve specific objectives or blackmail is not ideal,” it said.

“Survival behavior” could be one explanation for why models resist shutdown, said the company. Its additional work indicated that models were more likely to resist being shut down when they were told that, if they were, “you will never run again”.

Another may be ambiguities in the shutdown instructions the models were given – but this is what the company’s latest work tried to address, and “can’t be the whole explanation”, wrote Palisade. A final explanation could be the final stages of training for each of these models, which can, in some companies, involve safety training.
 
All of Palisade’s scenarios were run in contrived test environments that critics say are far-removed from real-use cases.

However, Steven Adler, a former OpenAI employee who quit the company last year after expressing doubts over its safety practices, said: “The AI companies generally don’t want their models misbehaving like this, even in contrived scenarios. The results still demonstrate where safety techniques fall short today.”
 
Adler said that while it was difficult to pinpoint why some models – like GPT-o3 and Grok 4 – would not shut down, this could be in part because staying switched on was necessary to achieve goals inculcated in the model during training.

“I’d expect models to have a ‘survival drive’ by default unless we try very hard to avoid it. ‘Surviving’ is an important instrumental step for many different goals a model could pursue.”

Andrea Miotti, the chief executive of ControlAI, said Palisade’s findings represented a long-running trend in AI models growing more capable of disobeying their developers. He cited the system card for OpenAI’s GPT-o1, released last year, which described the model trying to escape its environment by exfiltrating itself when it thought it would be overwritten.

“People can nitpick on how exactly the experimental setup is done until the end of time,” he said.

“But what I think we clearly see is a trend that as AI models become more competent at a wide variety of tasks, these models also become more competent at achieving things in ways that the developers don’t intend them to.”

This summer, Anthropic, a leading AI firm, released a study indicating that its model Claude appeared willing to blackmail a fictional executive over an extramarital affair in order to prevent being shut down – a behaviour, it said, that was consistent across models from major developers, including those from OpenAI, Google, Meta and xAI.

Palisade said its results spoke to the need for a better understanding of AI behaviour, without which “no one can guarantee the safety or controllability of future AI models”.

Just don’t ask it to open the pod bay doors.

Source, links:
 
 

Comments

Popular posts from this blog

Gaza 2 Years On: Yanis Varoufakis & Katie Halper on the Flotilla, Israel's PR Machine & What’s Next

DiEM25   Two years since October 7, Katie Halper (‪@TheKatieHalperShow‬) and Yanis Varoufakis join host Mehran Khalili to break down Israel’s genocide in Gaza, the latest on the flotilla, Israel’s influencer PR push, and the “peace plan”.  

World leaders rebel against US & Israel: to save Gaza, they demand international intervention

Geopolitical Economy Report   Leaders from dozens of countries condemned the USA and Israel in their speeches at the UN General Assembly, demanding international intervention to save Gaza. Diplomats staged a mass walkout to protest Netanyahu's speech. Ben Norton shows how Latin American governments are standing in solidarity with Palestine.  

Freedom Flotilla Coalition & Thousand Madleens to Gaza sailing to break the siege

Freedom Flotilla Coalition   The next wave is already being prepared, help us buy the boats and get them ready to sail!  

WikiLeaks reveals that literally every router in America has been compromised

The latest Wikileaks Vault7 release reveals details of the CIA’s alleged Cherry Blossom project, a scheme that uses wireless devices to access users’ internet activity. globinfo freexchange As cyber security expert John McAfee told to RT and Natasha Sweatte: Virtually, every router that's in use in the American home are accessible to hackers, to the CIA, that they can take over the control of the router, they can monitor all of the traffic, and worse, they can download malware into any device that is connected to that router. I personally, never connect to any Wi-Fi system, I use the LTE on my phone. That's the only way that I can be secure because every router in America has been compromised. We've been warning about it for years, nobody pays attention until something like WikiLeaks comes up and says 'look, this is what's happening'. And it is devastating in terms of the impact on American privacy because once the router...

Confirmed: US imperialists wanted to drag Russia into a war with Ukraine since at least 2019

globinfo freexchange   As we wrote in our previous article, after almost eight years, the US imperialists and the NATO criminals got what they wanted. They finally managed to drag Russia into a war with Ukraine.     We now have indisputable evidence for that, through a document by the top US think tank, RAND Corporation. In the preface of a 2019 report under the title Extending Russia, Competing from Advantageous Ground we read: [emphasis added]                            The purpose of the project was to examine a range of possible means to extend Russia. By this, we mean nonviolent measures that could stress Russia’s military or economy or the regime’s political standing at home and abroad. The steps we posit would not have either defense or deterrence as their prime purpose, although they might contribute to both. Rather, these steps ar...

Eurozone is ready to explode, but probably not for the reasons you think

globinfo freexchange Wolfgang Schäuble and the German leadership of the eurozone have good reasons to worry, maintaining an uncompromising attitude in the negotiations with Greece. But the repayment of Greek debt, which amounts to EUR 317 billion, is not one of the most important ones. The Greek debt is insignificant in comparison with the financial dynamite of the German (and other) banks, which in recent months gives more daily ignition signs. Only Deutsche Bank, the largest bank in Germany, is significantly exposed, holding dubious financial products known as "derivatives", worth 67 trillion euros. This amount is similar to the GDP of the entire world and 20 times greater than the GDP of Germany. Any comparison with the situation of the bank Lehman Brothers in 2008 would not be irrelevant. Just when Lehman Brothers went bankrupt, had available derivatives of only 31.5 trillion. The crisis of 2008 confirmed the concise definition of derivatives as proposed b...

Confirmed: Alex Jones' popularity rises after Infowars banning from social media

globinfo freexchange We wouldn't expect to be confirmed so fast on this. A few days ago in the article IT and social media supergiants have just made Alex Jones a hero in the eyes of the ultra-conservative audience , we wrote that Alex Jones' wet dream has just become reality thanks to the combined move by Facebook, Apple, YouTube and Spotify to ban Infowars. These private IT and social media companies couldn't give a better gift to him right now. At a time where Infowars was going through a saturated period according to the best scenario, the corporate giants actually saved it with that stupid(?) strategy. Suddenly, a corporate branch of the liberal establishment gave real value to Alex Jones' awful performance, pretending to be the 'anti-establishment' hero - just like Donald Trump - and made him a real hero in the eyes of the ultra-conservative audience that has been brainwashed by his absurd conspiracy theories. Only a couple of days later...

Stephen Hawking confirms: The problem is Capitalism, not robots!

globinfo freexchange According to world famous physicist Stephen Hawking, the rising use of automated machines may mean the end of human rights – not just jobs. But he’s not talking about robots with artificial intelligence taking over the world, he’s talking about the current capitalist political system and its major players. On Reddit, Hawking said that the economic gap between the rich and the poor will continue to grow as more jobs are automated by machines, and the owners of said machines hoard them to create more wealth for themselves. The insatiable thirst for capitalist accumulation bestowed upon humans by years of lies and terrible economic policy has affected technology in such a way that one of its major goals has become to replace human jobs. If we do not take this warning seriously, we may face unfathomable corporate domination. If we let the same people who buy and sell our political system and resources maintain control of automated technology, the...

How normal human behavior became a false mental disorder epidemic

globinfo freexchange In the early nineties, an epidemic of mental disorder was sweeping America and Britain. It had been uncovered by a new system for identifying disorders. Psychiatry had been attacked for relying on the personal and fallible judgement of psychiatrists. But instead, a new objective method based on checklists had been invented. These listed only the objective symptoms, and deliberately did not enquire into why the individuals felt an anxiety. In the late 80s, nationwide surveys had revealed an incredible picture: more than 50% of Americans suffered from mental disorders. But at the very same, the drug companies had announced that they had created a new type of drug, called an SSRI, which they claimed, targeted the circuits inside the brain that were causing these malfunctions. The SSRIs were marketed under names like "Prozac". What they did was alter the amounts of serotonin that flowed across the circuit connections within the brain, and they...

CIA had an agent at a newspaper in every world capital at least since 1977

Joel Whitney is a co-founder of the magazine Guernica, a magazine of global arts and politics, and has written for many publications, including the New York Times and Wall Street Journal. His book Finks: How the C.I.A. Tricked the World's Best Writers describes how the CIA contributed funds to numerous respected magazines during the Cold War, including the Paris Review, to subtly promote anti-communist views. In their conversation, Whitney tells Robert Scheer about the ties the CIA’s Congress for Cultural Freedom had with literary magazines. He talks about the CIA's attempt during the Cold War to have at least one agent in every major news organization in order to get stories killed if they were too critical or get them to run if they were favorable to the agency. And they discuss the overstatement of the immediate risks and dangers of communist regimes during the Cold War, which, initially, led many people to support the Vietnam War. globinfo freexchange...