spot_img
Tuesday, December 24, 2024
More
    spot_img
    HomeFuture PerfectThere is a fix for AI-generated articles. Why aren't we using it?

    There is a fix for AI-generated articles. Why aren’t we using it?

    -

    A computer screen full of lines of code.

    The language program developed by the US company OpenAI uses artificial intelligence to write a random binary code consisting of zeros and ones. | Frank Rumpenhorst/Photo Alliance via Getty Images

    It’s the start of the school year, and thus a start fresh round of speech On the new role of generative AI in schools. In the span of about three years, essays have gone from a mainstay of classroom teaching everywhere to a much less useful tool, for one reason: ChatGPT. Estimates of how many students use ChatGPT for essays changedBut it’s common enough Force teachers to adapt.

    Although generative AI has many limitations, student essays fall into the category of services in which they are very good: their training data contains many examples of essays on assigned topics, there is a huge demand for such essays, and the quality of prose and basic research in student essays are not so high. .

    This story first appeared in the Future Perfect Newsletter.

    Sign up here to explore the big, complex problems facing the world and the most effective ways to solve them Sent twice a week.

    At the moment, it is difficult to detect cheating in essays using AI tools. Several tools advertise that they can verify that text is AI-generated, but they are Not very reliable. Since False allegations of plagiarism by students One big thing, these tools have to be extremely accurate to work at all – and they just aren’t.

    AI fingerprinting with technology

    But here is a technical solution. In 2022, a team at OpenAI led by quantum computing researchers Scott AaronsonCreated a “watermarking” solution that makes AI text virtually unmistakable — even if the end user changes a few words here and there or rearranges the text. The solution is a bit technically complicated, but bear with me, because it’s also very interesting.

    At its core, the way AI text generation works is that the AI ​​”guess” a bunch of possible next tokens given what it’s seen in a text so far. In order not to be overly predictable and not to constantly produce the same repetitive output, AI models don’t simply guess the most likely token – instead, they include an element of randomization, favoring the “most likely” but sometimes selecting the least likely one.

    Watermarking works at this stage. Instead of generating the next token according to the AI’s random selection, the AI ​​uses a nonrandom process: OpenAI prefers the next tokens invented in an internal “scoring” function. It might, for example, slightly prefer words with the letter V, so that text generated with this scoring rule would have 20 percent more Vs than normal human text (although the actual scoring functions are more complex than that). Readers usually won’t notice this — in fact, I’ve edited this newsletter to increase the number of Vs, and I doubt that variation stands out in my normal writing.

    Likewise, watermarked text will, at a glance, be indistinguishable from normal AI output. But for OpenAI, which knows the hidden scoring rules, it will be easy to evaluate whether a given text scores much higher than human-generated text on that hidden scoring rule. If, for example, the scoring rule were my example above about the letter V, you could run this newsletter through a verification program and see that it has about 90 Vs in 1,200 words, more than you would expect based on how often V occurs in English. used It’s a clever, technologically sophisticated solution to a difficult problem, and there’s a working prototype for OpenAI two years.

    So if we want to solve the problem of AI text masquerading as human-written text, it is very solvable. But OpenAI hasn’t released their watermarking system, nor has anyone else in the industry. Why not?

    It’s all about competition

    If OpenAI — and only OpenAI — released a watermarking system for ChatGPT, making it easier to tell when generative AI produced a text, it would have little impact on student essay plagiarism. Word will get out quickly and everyone will switch to one of the many AI options available today: Meta’s Llama, Anthropic’s Claude, Google’s Gemini. Piracy will continue unabated, and OpenAI will lose many of its user bases. So it’s not shocking that they keep their watermarking system secret.

    In such circumstances, it may seem appropriate for regulators to take action. If every generative AI system needs to have watermarking, it is not a competitive disadvantage. That’s the reasoning behind a bill introduced this year in the California State Assembly, known as the California Digital Content Provenance StandardIt would require generative AI providers to make their AI-generated content identifiable, as well as require providers to remove generative AI labels and deceptive content. OpenAI is in favor of Bill’s – not surprising, as they are the only generative AI provider with a system that does this. Their rivals are mostly the opposite.

    I strongly favor some sort of watermarking requirement for generative AI content. AI can be incredibly useful, but its productive use doesn’t require pretending to be human-made. And while I don’t think it’s the government’s place to ban newspapers from replacing our reporters with AI, I certainly don’t want outlets to misinform readers about whether the content they’re reading was correct. Created by real people.

    Although I’d like some sort of watermarking obligation, I’m not sure that’s possible to implement. The best of the “open” AI models that have been published (like the recent Llama), models that you can run yourself on your computer, are of very high quality — certainly good enough for student compositions. They’re already there, and there’s no way to go back and add watermarking to them because anyone can run the current versions, regardless of what updates are applied to future versions. (This is in many ways my complicated feeling about open models. They enable a great deal of creativity, research, and discovery — and they also make it impossible to do all kinds of anti-common sense impersonation or anti-impersonation. Anti-Child Sexual Abuse Material arrangements that we might otherwise really like.)

    So while watermarking is possible, I don’t think we can rely on it, which means we as a society need to figure out how to address the ubiquity of simple, AI-generated content. Teachers are already switching to other methods to reduce essay requirements and student cheating in class. We’re also likely to see a move away from college admissions essays — and, frankly, that would be good riddance, as it probably was. Ever better way to select students.

    But while I won’t bemoan college admissions essays too much, and while I think teachers are more than capable of finding better ways to assess students, I notice some troubling trends throughout the story. There was an easy way to let us exploit the benefits of AI without the obvious downsides like impersonation and theft, yet AI development happened so quickly that society more or less let the opportunity pass us by. Individual labs could do this, but they won’t because it would put them at a competitive disadvantage – and everyone is unlikely to have a good way to do it.

    In the school plagiarism debate, the stakes are low. But the same dynamic reflected in the AI ​​watermarking debate — where commercial incentives keep companies from self-regulating and the pace of change keeps external regulators from stepping in until it’s too late — seems likely to remain as the stakes get higher.

    Source link

    Related articles

    Stay Connected

    0FansLike
    0FollowersFollow
    0FollowersFollow
    0SubscribersSubscribe
    google.com, pub-6220773807308986, DIRECT, f08c47fec0942fa0

    Latest posts