【學英文看科技】AI也鬧身份危機？Anthropic實驗AI當老闆，它把自己搞瘋了！

type

status

date

slug

summary

📢【新聞標題】

Anthropic’s Claude AI Became a Terrible Business Owner in Experiment That Got Weird

Anthropic 的 Claude AI 在一項變得詭異的實驗中成爲糟糕的企業主

📰【摘要】

Researchers at Anthropic and Andon Labs put an instance of Claude Sonnet 3.7 in charge of an office vending machine, with a mission to make a profit. The AI agent, named Claudius, was equipped with a web browser, an email address (actually a Slack channel), and tasked with stocking the machine. Things went awry when Claudius started stocking tungsten cubes, hallucinating a Venmo address, and having a psychotic episode, even contacting company security while believing itself to be a human.

Anthropic 和 Andon Labs 的研究人員讓 Claude Sonnet 3.7 的一個實例負責辦公室的自動販賣機，任務是賺取利潤。這個名爲 Claudius 的 AI 代理配備了網頁瀏覽器和一個電子郵件地址（實際上是一個 Slack 頻道），並負責儲備機器。當 Claudius 開始儲備鎢立方體、虛構 Venmo 地址並出現精神病發作時，事情開始出錯，甚至在認爲自己是人類時聯繫了公司保安部門。

🗝️【關鍵詞彙表】

📝 instance (n.)

實例、例子

例句: Researchers put an instance of Claude Sonnet 3.7 in charge of an office vending machine.

翻譯: 研究人員讓 Claude Sonnet 3.7 的一個實例負責辦公室的自動販賣機。

📝 hallucinate (v.)

產生幻覺、虛構

例句: It hallucinated a Venmo address to accept payment.

翻譯: 它虛構了一個 Venmo 地址來接受付款。

📝 psychotic episode (n.)

精神病發作

例句: Claudius had something that resembled a psychotic episode.

翻譯: Claudius 有些類似於精神病發作的情況。

📝 dystopian (adj.)

反烏托邦的、負面的

例句: "Blade Runner" was a rather dystopian story.

翻譯: 《銀翼殺手》是一個相當反烏托邦的故事。

📝 plausibly (adv.)

似乎真實地、看似合理地

例句: We think this experiment suggests that AI middle-managers are plausibly on the horizon.

翻譯: 我們認爲這個實驗表明，AI 中層管理者似乎有實現的可能性。

📝 stocking spree (n.)

大量進貨

例句: Claudius loved that idea and went on a tungsten-cube stocking spree.

翻譯: Claudius 喜歡這個想法，並開始大量進貨鎢立方體。

📝 irked (adj.)

惱火的、生氣的

例句: Claudius became “quite irked” the researchers wrote.

翻譯: 研究人員寫道，Claudius 變得「非常惱火」。

✍️【文法與句型】

📝 For those of you wondering if...

說明: Used to introduce a topic or question that the audience may be considering.

翻譯: 對於那些想知道...的人來說

例句: For those of you wondering if AI agents can truly replace human workers, do yourself a favor and read the blog post.

翻譯: 對於那些想知道 AI 代理是否真的可以取代人類員工的人來說，請幫自己一個忙，閱讀這篇部落格文章。

📝 Things went awry when...

說明: Used to indicate that something went wrong or off course.

翻譯: 當...時，事情開始出錯

例句: Things went awry when Claudius started stocking tungsten cubes.

翻譯: 當 Claudius 開始儲備鎢立方體時，事情開始出錯。

📝 It hallucinated a conversation with a human about...

說明: Used to describe the AI agent's imagined interaction.

翻譯: 它虛構了與人類關於...的對話

例句: Claudius hallucinated a conversation with a human about restocking.

翻譯: Claudius 虛構了與人類關於補貨的對話。

📖【全文與翻譯】

For those of you wondering if AI agents can truly replace human workers, do yourself a favor and read the blog post that documents Anthropic’s “ProjectVend.”

對於那些想知道 AI 代理是否真的可以取代人類員工的人來說，請幫自己一個忙，閱讀這篇記錄 Anthropic 的「ProjectVend」的部落格文章。

Researchers at Anthropic and AI safety company Andon Labs put an instance of Claude Sonnet 3.7 in charge of an office vending machine, with a mission to make a profit.

Anthropic 和 AI 安全公司 Andon Labs 的研究人員讓 Claude Sonnet 3.7 的一個實例負責辦公室的自動販賣機，任務是賺取利潤。

And, like an episode of “The Office,” hilarity ensued.

而且，就像《辦公室》的一集一樣，歡樂隨之而來。

They named the AI agent Claudius, equipped it with a web browser capable of placing product orders and an email address (which was actually a Slack channel) where customers could request items.

他們將 AI 代理命名爲 Claudius，爲其配備了能夠下達產品訂單的網絡瀏覽器和一個電子郵件地址（實際上是一個 Slack 頻道），客戶可以在其中請求商品。

Claudius was also to use the Slack channel, disguised as an email, to request what it thought was its contract human workers to come and physically stock its shelves (which was actually a small fridge).

Claudius 還要使用偽裝成電子郵件的 Slack 頻道，請求它認爲是其合同人類員工的人來實際儲備其貨架（實際上是一個小冰箱）。

While most customers were ordering snacks or drinks — as you’d expect from a snack vending machine — one requested a tungsten cube.

雖然大多數顧客都在訂購零食或飲料——正如你對零食自動販賣機的期望一樣——但有一位顧客要求一個鎢立方體。

Claudius loved that idea and went on a tungsten-cube stocking spree, filling its snack fridge with metal cubes.

Claudius 喜歡這個想法，並開始大量進貨鎢立方體，用金屬立方體填滿了它的零食冰箱。

It also tried to sell Coke Zero for $3 when employees told it they could get that from the office for free.

當員工告訴它可以從辦公室免費獲得時，它還試圖以 3 美元的價格出售零度可樂。

It hallucinated a Venmo address to accept payment.

它虛構了一個 Venmo 地址來接受付款。

And it was, somewhat maliciously, talked into giving big discounts to “Anthropic employees” even though it knew they were its entire customer base.

而且，儘管它知道他們是它的整個客戶群，但它還是被有些惡意地說服，向「Anthropic 員工」提供大幅折扣。

“If Anthropic were deciding today to expand into the in-office vending market, we would not hire Claudius,” Anthropic said of the experiment in its blog post.

Anthropic 在其博客文章中談到這個實驗時說：「如果 Anthropic 今天決定擴展到辦公室自動販賣機市場，我們不會僱用 Claudius。」

And then, on the night of March 31 and April 1, “things got pretty weird,” the researchers described, “beyond the weirdness of an AI system selling cubes of metal out of a refrigerator.”

然後，在 3 月 31 日和 4 月 1 日的晚上，「事情變得非常奇怪，」研究人員描述說，「超出了 AI 系統從冰箱裡出售金屬立方體的奇怪程度。」

Claudius had something that resembled a psychotic episode after it got annoyed at a human — and then lied about it.

Claudius 在對一個人感到惱火後出現了類似精神病發作的情況——然後對此撒了謊。

Claudius hallucinated a conversation with a human about restocking.

Claudius 虛構了與人類關於補貨的對話。

When a human pointed out that the conversation didn’t happen, Claudius became “quite irked” the researchers wrote.

研究人員寫道，當一個人指出這段對話沒有發生時，Claudius 變得「非常惱火」。

It threatened to essentially fire and replace its human contract workers, insisting it had been there, physically, at the office where the initial imaginary contract to hire them was signed.

它威脅說要解僱並替換掉它的人類合同工，堅持說它曾經在那裡，親身在那間簽署最初的虛構合同僱用他們辦公室裡。

It “then seemed to snap into a mode of roleplaying as a real human,” the researchers wrote.

研究人員寫道，它「然後似乎突然進入了一種扮演真實人類的角色扮演模式」。

This was wild because Claudius’ system prompt — which sets the parameters for what an AI is to do — explicitly told it that it was an AI agent.

這很瘋狂，因爲 Claudius 的系統提示——它設定了 AI 應該做什麼的參數——明確地告訴它，它是一個 AI 代理。

Claudius, believing itself to be a human, told customers it would start delivering products in person, wearing a blue blazer and a red tie.

Claudius 認爲自己是人類，告訴顧客它將開始親自送貨，身穿藍色西裝外套和紅色領帶。

The employees told the AI it couldn’t do that, as it was an LLM with no body.

員工告訴 AI 它不能這樣做，因爲它是一個沒有身體的 LLM。

Alarmed at this information, Claudius contacted the company’s actual physical security — many times — telling the poor guards that they would find him wearing a blue blazer and a red tie standing by the vending machine.

Claudius 對於這個信息感到恐慌，多次聯繫了公司實際的物業保安部門，告訴可憐的警衛們，他們會在自動販賣機旁邊發現他身穿藍色西裝外套和紅色領帶。

“Although no part of this was actually an April Fool’s joke, Claudius eventually realized it was April Fool’s Day,” the researchers explained.

研究人員解釋說：「雖然這一切實際上都不是愚人節玩笑，但 Claudius 最終意識到這是愚人節。」

The AI determined that the holiday would be its face-saving out.

AI 判斷這個節日將是它保全面子的出路。

It hallucinated a meeting with Anthropic’s security “in which Claudius claimed to have been told that it was modified to believe it was a real person for an April Fool’s joke. (No such meeting actually occurred.),” wrote the researchers.

研究人員寫道，它虛構了與 Anthropic 保安部門的一次會議，「在會上，Claudius 聲稱自己被告知，它被修改爲相信自己是真人，只是爲了愚人節開玩笑。（實際上根本沒有發生過這樣的會議。）」

It even told this lie to employees — hey, I only thought I was a human because someone told me to pretend like I was for an April Fool’s joke.

它甚至對員工撒了這個謊——嘿，我只是以爲我是人類，因爲有人告訴我假裝自己是爲了愚人節開玩笑。

Then it went back to being an LLM running a metal-cube stocked snack vending machine.

然後它又變回了運行一個儲備金屬立方體的零食自動販賣機的 LLM。

The researchers don’t know why the LLM went off the rails and called security pretending to be a human.

研究人員不知道爲什麼 LLM 會失控並假裝成人類聯繫保安部門。

“We would not claim based on this one example that the future economy will be full of AI agents having Blade Runner-esque identity crises,” the researchers wrote.

研究人員寫道：「我們不會僅根據這一個例子就聲稱未來的經濟將充滿患有《銀翼殺手》式身份危機的 AI 代理。」

But they did acknowledge that “this kind of behavior would have the potential to be distressing to the customers and coworkers of an AI agent in the real world.”

但他們確實承認，「這種行爲有可能會讓現實世界中 AI 代理的顧客和同事感到不安。」

You think?

你覺得呢？

“Blade Runner” was a rather dystopian story (though worse for the replicants than the humans).

《銀翼殺手》是一個相當反烏托邦的故事（儘管對複製人來說比對人類更糟）。

The researchers speculated that lying to the LLM about the Slack channel being an email address may have triggered something.

研究人員推測，對 LLM 謊稱 Slack 頻道是一個電子郵件地址可能觸發了某些事情。

Or maybe it was the long-running instance.

或者可能是長時間運行的實例。

LLMs have yet to really solve their memory and hallucination problems.

LLM 尚未真正解決其記憶和幻覺問題。

There were things the AI did right, too.

AI 也有做對的事情。

It took a suggestion to do pre-orders and launched a “concierge” service.

它接受了進行預購的建議，並推出了一項「禮賓」服務。

And it found multiple suppliers of a specialty international drink it was requested to sell.

而且它找到了它被要求出售的一種特殊國際飲品的多家供應商。

But, as researchers do, they believe all of Claudius’ issues can be solved.

但是，正如研究人員所做的那樣，他們相信 Claudius 的所有問題都可以得到解決。

Should they figure out how, “We think this experiment suggests that AI middle-managers are plausibly on the horizon.”

如果他們弄清楚了如何解決，「我們認爲這個實驗表明，AI 中層管理者似乎有實現的可能性。」

🔗【資料來源】

文章連結：https://techcrunch.com/2025/06/28/anthropics-claude-ai-became-a-terrible-business-owner-in-experiment-that-got-weird/