What should we make of the Large Language Model (LLM)? This is quite literally a billion dollar question.
This is addressed this week an analysis by Leopold Aschenbrenner, a former OpenAI employee, in which he makes the case that we may be just a few years away from large language model-based general intelligence that can be “drop-in remote workers” that can perform any task a human remote worker can do. (He thinks we should go ahead and make it so China doesn’t get there first.)
His (very long but worth reading) analysis is a good encapsulation of a strand of thinking about large language models like ChatGPT: those that are a larval form of artificial general intelligence (AGI) and as we train and learn how to fine-tune them more and more. And more about prompting, their notorious faults will largely go away.
It is sometimes glossed as a view “Scale is all you need,” which means more training data and more computing power. GPT-2 wasn’t very good, but then the big GPT-3 was much better, the even bigger GPT-4 is better still, and our default expectation should be that this trend will continue. Have a complaint that big language models aren’t good at anything? Wait until we have a bigger one. (Disclosure: Vox Media is one of several publishers that have signed a partnership agreement with OpenAI. Our reporting is editorially independent.)
Among the most prominent skeptics of this view are two AI experts who otherwise rarely agree: Ian LeCun, head of AI research at Facebook, and Gary Marcus, an NYU professor and vocal LL.M. skeptic. They argue that some of the shortcomings of LLMs — their difficulty with logical reasoning tasks, their tendency toward “Hallucinations” — does not disappear with the scale. They expect and say greater returns to scale in the future We probably won’t get it Fully general artificial intelligence doubling our current methods with billions more.
Who is right? To be honest, I think both sides are very confident.
Scale makes LLMs much better at a wide range of cognitive tasks, and to announce that this trend will suddenly stop seems premature and sometimes willfully ignorant. I’ve been reporting on AI for six years now, and I keep hearing skeptics proclaim that some straightforward tasks LLMs can’t and never will because they require “true intelligence.” Like clockwork, after a few years (or sometimes months), one figures out how to get an LLM to do that job properly.
I have heard that from the experts Programming is one thing that can never be used for deep learningAnd this is now one of them The strongest aspect of LL.M. When I see someone confidently asserting that the LLM can’t do some complex reasoning task, I bookmark that claim. Reasonably often, it is Turn off immediately That GPT-4 or its top-level competitors can do this after all.
I find skeptics thoughtful and their criticisms reasonable, but their decidedly mixed track record makes me think they should be more skeptical about their skepticism.
We don’t know how far the scale can take us
For those who think we’ll have artificial general intelligence in a few years, my instinct is that they’re also overstating their case. The following graphic illustrates Aschenbrenner’s argument:
I don’t want to completely debunk the “straight line on a graph” approach to predicting the future; At the very least, “continuation of current trends” is always a possibility worth considering. But I want to mention (And so have other critics) that here the right-hand axis is … completely invented.
GPT-2 is in no way particularly equivalent to a human preschooler. GPT-3s are significantly better than elementary school students on most academic tasks and, of course, significantly worse than them on learning a new skill from some exposure. LLMs are sometimes deceptively human in their conversations and engagements with us, but they are fundamentally not very human; They have different strengths and different weaknesses, and it is very challenging to capture their abilities by direct comparison with humans.
Additionally, we really have no idea where “automated AI researchers/engineers” are on this graph. Does it require much advance like going from GPT-3 to GPT-4? As much as twice? Are there any necessary advances that didn’t happen when you went from GPT-3 to GPT-4? Why six orders above GPT-4 instead of five, or seven, or ten?
“AGI by 2027 is plausible … because we’re too ignorant to rule it out … because we have no idea what the distance is to human-level research on the Y-axis of this graph,” responded AI security researcher and advocate Eliezer Yudkowski. Ash burner.
This is a position I have a lot of sympathy for. Because we have little understanding of what problems large-scale LLMs will be able to solve, we cannot confidently declare firm bounds on what they will be able to do before we see them. But that means we can’t confidently declare that they will have the power.
Predictions are difficult – especially about the future
It is extraordinarily difficult to predict the capabilities of technologies that do not yet exist. Most people who have been doing this for the past few years have gotten egg on their faces. For that reason, the researchers and thinkers I respect most emphasize broad possibilities.
The huge improvement in common logic we saw between GPT-3 and GPT-4 will persist as we continue to scale the models. Maybe they won’t, but we’ll still see huge improvements in the effective capabilities of AI models thanks to improvements in how we use them: finding out. system For handling hallucinations, cross-checking model results, and better tuning models to give us useful answers.
Maybe we’ll build generally intelligent systems that have LLM as components. Or maybe OpenAI’s much-anticipated GPT-5 will be a huge disappointment, bursting the AI hype bubble and leaving researchers wondering what commercially valuable systems can be built without huge improvements on the immediate horizon.
Importantly, you don’t have to believe that AGI is possibly coming in 2027 to believe that the possibility and surrounding policy implications should be taken seriously. I think the broad strokes of the scenario Aschenbrenner outlines – where an AI company develops an AI system that it can aggressively use to further automate internal AI research, leading to a world in which a small number of people have a large number of AI assistants and servants. World-changing projects can follow at a pace that doesn’t allow for much oversight—a real and frightening prospect. Many are spending Billions of dollars Bring on that world as quickly as possible, and many of them think it’s on the near horizon.
It’s worth a substantive conversation and substantive policy feedback, even if we think those leading the way for AI are too sure of themselves. Marcus writes Aschenbrenner – and I agree – that “If you read his manuscript, please read it for his concern about our underpreparedness, not for his sensational deadlines. The point is, we should be concerned, no matter how much time we have.
But the conversation will be better, and the policy response more appropriately tailored to the situation, if we are honest about how little we know — and if we take that confusion as a motivation to measure and predict what we think about when. Could be better in case. It comes to AI.
A version of this story originally appeared in the Future Perfect Newsletter. Register here!