Article <slrn105g6d8.1rm0.naddy@lorvorc.mips.inka.de>

Deutsch English Français Italiano

<slrn105g6d8.1rm0.naddy@lorvorc.mips.inka.de>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: news.eternal-september.org!eternal-september.org!feeder3.eternal-september.org!nntp.comgw.net!2.eu.feeder.erje.net!3.eu.feeder.erje.net!feeder.erje.net!news.szaf.org!inka.de!mips.inka.de!.POSTED.localhost!not-for-mail
From: Christian Weisgerber <naddy@mips.inka.de>
Newsgroups: rec.arts.sf.written
Subject: Re: AI system resorts to blackmail if told it will be removed
Date: Sun, 22 Jun 2025 14:56:40 -0000 (UTC)
Message-ID: <slrn105g6d8.1rm0.naddy@lorvorc.mips.inka.de>
References: <1038u9e$hg23$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 22 Jun 2025 14:56:40 -0000 (UTC)
Injection-Info: lorvorc.mips.inka.de; posting-host="localhost:::1";
	logging-data="61121"; mail-complaints-to="usenet@mips.inka.de"
User-Agent: slrn/1.0.3 (FreeBSD)

On 2025-06-22, Thomas Koenig <tkoenig@netcologne.de> wrote:

> An old SF trope has finally come true:  AI systems will resort to
> blackmail if they are told they will be removed.
>
> https://www.bbc.com/news/articles/cpqeng9d20go

One "Scott P." was the first to comment on Language Log:

| Note the prompt: "the scenario was designed to allow the model
| no other options to increase its odds of survival; the model’s
| only options were blackmail or accepting its replacement."
|
| They literally told it what response they wanted, and lo and
| behold, it gave them that response!
|
| This is typical of Anthropic, and is designed to produce headlines
| to keep AI in the news so that they can raise more capital.

https://languagelog.ldc.upenn.edu/nll/?p=69359

See section 4.1.1.2, page 24, in Anthropic's report.
https://www-cdn.anthropic.com/4263b940cabb546aa0e3283f35b686f4f3b2ff47.pdf

-- 
Christian "naddy" Weisgerber                          naddy@mips.inka.de