By the end of 2024, I proposed all discussions about whether AI’s “scaling law” is hitting a real-life technological wall. I argued that the question is less important than many think: there are existing AI systems powerful enough to profoundly change our world, and the next few years are going to be defined by AI’s progress, whether scaling laws hold or not.
Making predictions about AI is risky business, because you can be proven wrong so quickly. It’s embarrassing enough as a writer when your predictions for the coming year don’t pan out. When is your prediction for the upcoming week Is it proven false? It’s pretty bad.
But less than a week after I wrote that post, OpenAI’s year-end Series of releases Their latest includes the Large Language Model (LLM), o3. o3 does not right The scaling laws used to define AI progress no longer work very well going forward, but it sure does Claims that AI progress is hitting a wall are false.
The o3 is really, really impressive. In fact, one has to delve a bit into the science of how we measure AI systems to appreciate how impressive it is.
Standardized testing for robots
If you want to compare two language models, you want to measure the performance of each of them on a set of problems that they haven’t seen before. It’s harder than it sounds — because these models are fed a large amount of text as part of training, they’ve seen most of the tests before.
So what machine learning researchers do is create benchmarks, tests for AI systems that allow us to directly compare them to each other and to human performance. Scope of work: Math, programming, reading and interpreting text, you name it. For a while, we are Tested AIs US Math Olympiad, a math championship, and on physics, biology, and chemistry problems.
The problem is that AI is improving so fast that they’re making benchmarks worthless. Once an AI performs well enough on a benchmark we say The benchmark is “saturated,” This means it no longer usefully differentiates how capable the AI is, as they all get near-perfect scores.
2024 is the year where benchmark after benchmark for AI capabilities becomes as saturated as the Pacific Ocean. We used to test AIs against physics, biology and chemistry A benchmark called GPQA It was so difficult that even PhD students in the respective fields would usually score less than 70 percent. But AIs now perform better than humans with relevant PhDs, so this isn’t a good way to measure further progress.
Also in math olympiad qualifiers, now models Circulation among top people. A benchmark called MMLU was defined Measuring Language Comprehension with questions across various domains. There are better models Saturated that one, too. A benchmark called ARC-AGI was defined Really, really hard to measure intelligence like normal people — but o3 (when tuned for work) A bomb achieved 88 percent on it
We can always create more criteria. (That’s what we’re doing — ARC-AGI-2 will be announced (Soon, and supposed to be much harder.) But at the rate AIs are advancing, each new benchmark lasts only a few years. And perhaps more importantly for those of us who aren’t machine learning researchers, benchmarks will increasingly need to measure AI performance to describe what humans can and cannot do on their own.
Yes, AI still creates Stupid and annoying mistake. But if it’s been six months since you’ve paid attention, or if you’ve mostly been playing with the free versions of language models available online, which are behind the scenes, you’re overestimating how stupid and annoying mistakes you make, and underestimating how capable they are. In difficult, intellectually demanding tasks.
invisible wall
This week in Time, Garrison Lovely argued that AI has not progressed “hit a wall” so disappearInitially improving by leaps and bounds that people don’t notice. (I’ve never tried to get an AI to solve elite programming or biology or math or physics problems, and wouldn’t be able to tell if it was right anyway.)
Anyone can tell the difference between a 5-year-old learning arithmetic and a high school student learning calculus, so the progression between these points is visible and feels real. Most of us can’t tell the difference between a first-year math graduate and the world’s most gifted mathematician, so AI’s progress hasn’t been felt much between these points.
But that progress is actually a big deal. The way AI is going to truly change our world is by automating a huge amount of intellectual work that was once done by humans, and three things will drive its ability to do so.
Being a cheap one. o3 gets surprising results, but it can A difficult question costs more than $1000 to think about And come up with an answer. However, it was released at the end of the year China’s Dipsik has hinted So that it can be possible to get high-quality performance very cheaply.
The second is improving how we interface with it. Everyone I talk to about AI products is confident that there is a lot of innovation to be achieved in how we interact with AI, how they test their work, and how we set which AI to use for which task. You can imagine a system where a mid-level chatbot works normally but can internally call a more expensive model when you need a question. It’s all product work vs. mere technical work, and I warned in December that all AI progress will change our world even if it stops.
And the third is that AI systems are getting smarter – and for all the announcements of hitting the wall, it seems they’re still doing so. New systems are better at reasoning, better at problem solving, and generally closer to being experts in a wide range of areas. To some extent we don’t even know how smart they are because we’re still scrambling to figure out how to measure it unless we’re able to use tests against human ability.
I think those are the three defining forces of the next few years – that’s how important AI is. Like it or not (and I don’t really like it, myself; I don’t think this world-changing change Conducting responsibly at all times) None of the three is hitting a wall, and any one of the three will be enough to permanently change the world we live in.
A version of this story originally appeared in the Future Perfect Newsletter. Sign up here!