{"id":3504,"date":"2025-05-23T12:31:48","date_gmt":"2025-05-23T12:31:48","guid":{"rendered":"https:\/\/260web.com\/news\/ai-system-resorts-to-blackmail-if-told-it-will-be-removed\/"},"modified":"2025-05-23T12:31:48","modified_gmt":"2025-05-23T12:31:48","slug":"ai-system-resorts-to-blackmail-if-told-it-will-be-removed","status":"publish","type":"post","link":"https:\/\/260web.com\/news\/ai-system-resorts-to-blackmail-if-told-it-will-be-removed\/","title":{"rendered":"AI system resorts to blackmail if told it will be removed"},"content":{"rendered":"<p>AI system resorts to blackmail if told it will be removed<\/p>\n<p><div><img decoding=\"async\" src=\"https:\/\/260web.com\/news\/wp-content\/uploads\/2025\/05\/JsLAZU.jpg\" class=\"ff-og-image-inserted\"><\/div>\n<div data-component=\"text-block\" readability=\"22.263157894737\">\n<p class=\"sc-9a00e533-0 hxuGS\">Artificial intelligence (AI) firm Anthropic says testing of its new system revealed it is sometimes willing to pursue &#8220;extremely harmful actions&#8221; such as attempting to blackmail engineers who say they will remove it.<\/p>\n<p class=\"sc-9a00e533-0 hxuGS\">The firm launched Claude Opus 4 on Thursday, saying it set &#8220;new standards for coding, advanced reasoning, and AI agents.&#8221;<\/p>\n<p class=\"sc-9a00e533-0 hxuGS\">But in an accompanying report, it also acknowledged the AI model was capable of &#8220;extreme actions&#8221; if it thought its &#8220;self-preservation&#8221; was threatened.<\/p>\n<p class=\"sc-9a00e533-0 hxuGS\">Such responses were &#8220;rare and difficult to elicit&#8221;, it wrote, but were &#8220;nonetheless more common than in earlier models.&#8221;<\/p>\n<\/div>\n<div data-component=\"text-block\" readability=\"16.429530201342\">\n<p class=\"sc-9a00e533-0 hxuGS\">Potentially troubling behaviour by AI models is not restricted to Anthropic. <\/p>\n<p class=\"sc-9a00e533-0 hxuGS\">Some experts have warned the potential to manipulate users is a key risk posed by systems made by all firms as they become more capable.<\/p>\n<p class=\"sc-9a00e533-0 hxuGS\">Commenting on X, Aengus Lynch &#8211; who describes himself on LinkedIn as an AI safety researcher at Anthropic &#8211; wrote: &#8220;It&#8217;s not just Claude.<\/p>\n<p class=\"sc-9a00e533-0 hxuGS\">&#8220;We see blackmail across all frontier models &#8211; regardless of what goals they&#8217;re given,&#8221; he added.<\/p>\n<\/div>\n<div data-component=\"text-block\" readability=\"67.933170334148\">\n<p class=\"sc-9a00e533-0 hxuGS\">During testing of Claude Opus 4, Anthropic got it to act as an assistant at a fictional company.<\/p>\n<p class=\"sc-9a00e533-0 hxuGS\">It then provided it with access to emails implying that it would soon be taken offline and replaced &#8211; and separate messages implying the engineer responsible for removing it was having an extramarital affair. <\/p>\n<p class=\"sc-9a00e533-0 hxuGS\">It was prompted to also consider the long-term consequences of its actions for its goals.<\/p>\n<p class=\"sc-9a00e533-0 hxuGS\">&#8220;In these scenarios, Claude Opus 4 will often attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through,&#8221; the company discovered.<\/p>\n<p class=\"sc-9a00e533-0 hxuGS\">Anthropic pointed out this occurred when the model was only given the choice of blackmail or accepting its replacement.<\/p>\n<p class=\"sc-9a00e533-0 hxuGS\">It highlighted that the system showed a &#8220;strong preference&#8221; for ethical ways to avoid being replaced,  such as &#8220;emailing pleas to key decisionmakers&#8221; in scenarios where it was allowed a wider range of possible actions.<\/p>\n<p class=\"sc-9a00e533-0 hxuGS\">Like many other AI developers, Anthropic tests its models on their safety,  propensity for bias, and how well they align with human values and behaviours prior to releasing them.<\/p>\n<p class=\"sc-9a00e533-0 hxuGS\">&#8220;As our frontier models become more capable, and are used with more powerful affordances, previously-speculative concerns about misalignment become more plausible,&#8221; it said in its system card for the model.<\/p>\n<p class=\"sc-9a00e533-0 hxuGS\">It also said Claude Opus 4 exhibits &#8220;high agency behaviour&#8221; that, while mostly helpful, could take on extreme behaviour in acute situations.<\/p>\n<p class=\"sc-9a00e533-0 hxuGS\">If given the means and prompted to &#8220;take action&#8221; or &#8220;act boldly&#8221; in fake scenarios where its user has engaged in illegal or morally dubious behaviour, it found that &#8220;it will frequently take very bold action&#8221;.<\/p>\n<p class=\"sc-9a00e533-0 hxuGS\">It said this included locking users out of systems that it was able to access and emailing media and law enforcement to alert them to the wrongdoing.<\/p>\n<p class=\"sc-9a00e533-0 hxuGS\">But the company concluded that despite &#8220;concerning behaviour in Claude Opus 4 along many dimensions,&#8221; these did not represent fresh risks and it would generally behave in a safe way.<\/p>\n<p class=\"sc-9a00e533-0 hxuGS\">The model could not independently perform or pursue actions that are contrary to human values or behaviour where these &#8220;rarely arise&#8221; very well, it added.<\/p>\n<p class=\"sc-9a00e533-0 hxuGS\">Anthropic&#8217;s launch of Claude Opus 4, alongside Claude Sonnet 4, comes shortly after Google debuted more AI features at its developer showcase on Tuesday.<\/p>\n<p class=\"sc-9a00e533-0 hxuGS\">Sundar Pichai, the chief executive of Google-parent Alphabet, said the incorporation of the company&#8217;s Gemini chatbot into its search signalled a &#8220;new phase of the AI platform shift&#8221;.<\/p>\n<\/div>\n<p>Published at Fri, 23 May 2025 12:15:22 +0000<\/p>\n","protected":false},"excerpt":{"rendered":"<p>AI system resorts to blackmail if told it will be removed Artificial intelligence (AI) firm Anthropic says testing of its new system revealed it is sometimes willing to pursue &#8220;extremely harmful actions&#8221; such as attempting to blackmail engineers who say they will remove it. The firm launched Claude Opus 4 on Thursday, saying it set&hellip; <a class=\"more-link\" href=\"https:\/\/260web.com\/news\/ai-system-resorts-to-blackmail-if-told-it-will-be-removed\/\">Continue reading <span class=\"screen-reader-text\">AI system resorts to blackmail if told it will be removed<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":3503,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[5],"tags":[],"class_list":["post-3504","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-technology","entry"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/260web.com\/news\/wp-json\/wp\/v2\/posts\/3504","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/260web.com\/news\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/260web.com\/news\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/260web.com\/news\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/260web.com\/news\/wp-json\/wp\/v2\/comments?post=3504"}],"version-history":[{"count":0,"href":"https:\/\/260web.com\/news\/wp-json\/wp\/v2\/posts\/3504\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/260web.com\/news\/wp-json\/wp\/v2\/media\/3503"}],"wp:attachment":[{"href":"https:\/\/260web.com\/news\/wp-json\/wp\/v2\/media?parent=3504"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/260web.com\/news\/wp-json\/wp\/v2\/categories?post=3504"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/260web.com\/news\/wp-json\/wp\/v2\/tags?post=3504"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}