The Time Sam Altman Asked for a Countersurveillance Audit of OpenAI | EUROtoday

On May 21, 2025

Get real time updates directly on you device, subscribe now.

Dario Amodei’s AI security contingent was rising disquieted with a few of Sam Altman’s behaviors. Shortly after OpenAI’s Microsoft deal was inked in 2019, a number of of them have been shocked to find the extent of the guarantees that Altman had made to Microsoft for which applied sciences it will get entry to in return for its funding. The phrases of the deal didn’t align with what they’d understood from Altman. If AI issues of safety really arose in OpenAI’s fashions, they fearful, these commitments would make it far harder, if not unattainable, to stop the fashions’ deployment. Amodei’s contingent started to have critical doubts about Altman’s honesty.

“We’re all pragmatic people,” an individual within the group says. “We’re obviously raising money; we’re going to do commercial stuff. It might look very reasonable if you’re someone who makes loads of deals like Sam, to be like, ‘All right, let’s make a deal, let’s trade a thing, we’re going to trade the next thing.’ And then if you are someone like me, you’re like, ‘We’re trading a thing we don’t fully understand.’ It feels like it commits us to an uncomfortable place.”

This was in opposition to the backdrop of a rising paranoia over completely different points throughout the corporate. Within the AI security contingent, it centered on what they noticed as strengthening proof that highly effective misaligned methods might result in disastrous outcomes. One weird expertise specifically had left a number of of them considerably nervous. In 2019, on a mannequin educated after GPT‑2 with roughly twice the variety of parameters, a bunch of researchers had begun advancing the AI security work that Amodei had needed: testing reinforcement studying from human suggestions (RLHF) as a option to information the mannequin towards producing cheerful and constructive content material and away from something offensive.

But late one night time, a researcher made an replace that included a single typo in his code earlier than leaving the RLHF course of to run in a single day. That typo was an vital one: It was a minus signal flipped to a plus signal that made the RLHF course of work in reverse, pushing GPT‑2 to generate extra offensive content material as a substitute of much less. By the subsequent morning, the typo had wreaked its havoc, and GPT‑2 was finishing each single immediate with extraordinarily lewd and sexually express language. It was hilarious—and in addition regarding. After figuring out the error, the researcher pushed a repair to OpenAI’s code base with a remark: Let’s not make a utility minimizer.

In half fueled by the belief that scaling alone might produce extra AI developments, many workers additionally fearful about what would occur if completely different firms caught on to OpenAI’s secret. “The secret of how our stuff works can be written on a grain of rice,” they might say to one another, which means the only phrase scale. For the identical purpose, they fearful about highly effective capabilities touchdown within the arms of dangerous actors. Leadership leaned into this concern, regularly elevating the specter of China, Russia, and North Korea and emphasizing the necessity for AGI growth to remain within the arms of a US group. At instances this rankled workers who weren’t American. During lunches, they might query, Why did it need to be a US group? remembers a former worker. Why not one from Europe? Why not one from China?

During these heady discussions philosophizing concerning the lengthy‑time period implications of AI analysis, many workers returned usually to Altman’s early analogies between OpenAI and the Manhattan Project. Was OpenAI actually constructing the equal of a nuclear weapon? It was an odd distinction to the plucky, idealistic tradition it had constructed so far as a largely tutorial group. On Fridays, workers would chill after a protracted week for music and wine nights, unwinding to the soothing sounds of a rotating solid of colleagues taking part in the workplace piano late into the night time.

https://www.wired.com/story/empire-of-ai-excerpt-sam-altman-elon-musk/