Why is OpenAI’s superalignment team imploding?
Editor’s note, May 18, 2024, 7:30 pm ET: This story has been updated to reflect OpenAI CEO Sam Altman’s tweet on Saturday afternoon that the company was in the process of changing its offboarding documents.
On Monday, OpenAI announced exciting new product news: ChatGPT can now talk like a human.
It has a cheery, slightly ingratiating feminine voice that sounds impressively non-robotic, and a bit familiar if you’ve seen a certain 2013 Spike Jonze film. “Her,” tweeted OpenAI CEO Sam Altman, referencing the movie in which a man falls in love with an AI assistant voiced by Scarlett Johansson.
But the product release of ChatGPT 4o was quickly overshadowed by much bigger news out of OpenAI: the resignation of the company’s co-founder and chief scientist, Ilya Sutskever, who also led its superalignment team, as well as that of his co-team leader Jan Leike (who we put on the Future Perfect 50 list last year).
The resignations didn’t come as a total surprise. Sutskever had been involved in the boardroom revolt that led to Altman’s temporary firing last year, before the CEO quickly returned to his perch. Sutskever publicly regretted his actions and backed Altman’s return, but he’s been mostly absent from the company since, even as other members of OpenAI’s policy, alignment, and safety teams have departed.
But what has really stirred speculation was the radio silence from former employees. Sutskever posted a pretty typical resignation message, saying “I’m confident that OpenAI will build AGI that is both safe and beneficial…I am excited for what comes next.”
Leike … didn’t. His resignation message was simply: “I resigned.” After several days of fervent speculation, he expanded on this on Friday morning, explaining that he was worried OpenAI had shifted away from a safety-focused culture.
Questions arose immediately: Were they forced out? Is this delayed fallout of Altman’s brief firing last fall? Are they resigning in protest of some secret and dangerous new OpenAI project? Speculation filled the void because no one who had once worked at OpenAI was talking.
It turns out there’s a very clear reason for that. I have seen the extremely restrictive off-boarding agreement that contains nondisclosure and non-disparagement provisions former OpenAI employees are subject to. It forbids them, for the rest of their lives, from criticizing their former employer. Even acknowledging that the NDA exists is a violation of it.
If a departing employee declines to sign the document, or if they violate it, they can lose all vested equity they earned during their time at the company, which is likely worth millions of dollars. One former employee, Daniel Kokotajlo, who posted that he quit OpenAI “due to losing confidence that it would behave responsibly around the time of AGI,” has confirmed publicly that he had to surrender what would have likely turned out to be a huge sum of money in order to quit without signing the document.
While nondisclosure agreements aren’t unusual in highly competitive Silicon Valley, putting an employee’s already-vested equity at risk for declining or violating one is. For workers at startups like OpenAI, equity is a vital form of compensation, one that can dwarf the salary they make. Threatening that potentially life-changing money is a very effective way to keep former employees quiet.
OpenAI did not respond to a request for comment in time for initial publication. After publication, an OpenAI spokesperson sent me this statement: “We have never canceled any current or former employee’s vested equity nor will we if people do not sign a release or nondisparagement agreement when they exit.”
Sources close to the company I spoke to told me that this represented a change in policy as they understood it. When I asked the OpenAI spokesperson if that statement represented a change, they replied, “This statement reflects reality.”
On Saturday afternoon, a little more than a day after this article published, Altman acknowledged in a tweet that there had been a provision in the company’s off-boarding documents about “potential equity cancellation” for departing employees, but said the company was in the process of changing that language.
All of this is highly ironic for a company that initially advertised itself as OpenAI — that is, as committed in its mission statements to building powerful systems in a transparent and accountable manner.
OpenAI long ago abandoned the idea of open-sourcing its models, citing safety concerns. But now it has shed the most senior and respected members of its safety team, which should inspire some skepticism about whether safety is really the reason why OpenAI has become so closed.
The tech company to end all tech companies
OpenAI has spent a long time occupying an unusual position in tech and policy circles. Their releases, from DALL-E to ChatGPT, are often very cool, but by themselves they would hardly attract the near-religious fervor with which the company is often discussed.
What sets OpenAI apart is the ambition of its mission: “to ensure that artificial general intelligence — AI systems that are generally smarter than humans — benefits all of humanity.” Many of its employees believe that this aim is within reach; that with perhaps one more decade (or even less) — and a few trillion dollars — the company will succeed at developing AI systems that make most human labor obsolete.
Which, as the company itself has long said, is as risky as it is exciting.
“Superintelligence will be the most impactful technology humanity has ever invented, and could help us solve many of the world’s most important problems,” a recruitment page for Leike and Sutskever’s team at OpenAI states. “But the vast power of superintelligence could also be very dangerous, and could lead to the disempowerment of humanity or even human extinction. While superintelligence seems far off now, we believe it could arrive this decade.”
Naturally, if artificial superintelligence in our lifetimes is possible (and experts are divided), it would have enormous implications for humanity. OpenAI has historically positioned itself as a responsible actor trying to transcend mere commercial incentives and bring AGI about for the benefit of all. And they’ve said they are willing to do that even if that requires slowing down development, missing out on profit opportunities, or allowing external oversight.
“We don’t think that AGI should be just a Silicon Valley thing,” OpenAI co-founder Greg Brockman told me in 2019, in the much calmer pre-ChatGPT days. “We’re talking about world-altering technology. And so how do you get the right representation and governance in there? This is actually a really important focus for us and something we really want broad input on.”
OpenAI’s unique corporate structure — a capped-profit company ultimately controlled by a nonprofit — was supposed to increase accountability. “No one person should be trusted here. I don’t have super-voting shares. I don’t want them,” Altman assured Bloomberg’s Emily Chang in 2023. “The board can fire me. I think that’s important.” (As the board found out last November, it could fire Altman, but it couldn’t make the move stick. After his firing, Altman made a deal to effectively take the company to Microsoft, before being ultimately reinstated with most of the board resigning.)
But there was no stronger sign of OpenAI’s commitment to its mission than the prominent roles of people like Sutskever and Leike, technologists with a long history of commitment to safety and an apparently genuine willingness to ask OpenAI to change course if needed. When I said to Brockman in that 2019 interview, “You guys are saying, ‘We’re going to build a general artificial intelligence,’” Sutskever cut in. “We’re going to do everything that can be done in that direction while also making sure that we do it in a way that’s safe,” he told me.
Their departure doesn’t herald a change in OpenAI’s mission of building artificial general intelligence — that remains the goal. But it almost certainly heralds a change in OpenAI’s interest in safety work; the company hasn’t announced who, if anyone, will lead the superalignment team.
And it makes it clear that OpenAI’s concern with external oversight and transparency couldn’t have run all that deep. If you want external oversight and opportunities for the rest of the world to play a role in what you’re doing, making former employees sign extremely restrictive NDAs doesn’t exactly follow.
Changing the world behind closed doors
This contradiction is at the heart of what makes OpenAI profoundly frustrating for those of us who care deeply about ensuring that AI really does go well and benefits humanity. Is OpenAI a buzzy, if midsize tech company that makes a chatty personal assistant, or a trillion-dollar effort to create an AI god?
The company’s leadership says they want to transform the world, that they want to be accountable when they do so, and that they welcome the world’s input into how to do it justly and wisely.
But when there’s real money at stake — and there are astounding sums of real money at stake in the race to dominate AI — it becomes clear that they probably never intended for the world to get all that much input. Their process ensures former employees — those who know the most about what’s happening inside OpenAI — can’t tell the rest of the world what’s going on.
The website may have high-minded ideals, but their termination agreements are full of hard-nosed legalese. It’s hard to exercise accountability over a company whose former employees are restricted to saying “I resigned.”
ChatGPT’s new cute voice may be charming, but I’m not feeling especially enamored.
Update, May 18, 7:30 pm ET: This story was published on May 17 and has been updated multiple times, most recently to include Sam Altman’s response on social media.
A version of this story originally appeared in the Future Perfect newsletter. Sign up here!