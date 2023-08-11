The scale and transparency of this exercise, and the participation of so many creators of large language models like ChatGPT, is notable. And it makes sense why the companies would want to play ball: They aren’t paying to participate in this weekend’s challenge, organizer Rumman Chowdhury said, so they’re essentially getting a mass volume of testing and research for free. Plus, the White House is keeping an eye on it.

What matters more is what happens after this weekend. The companies, as well as independent researchers, will receive the results of the competition as a massive database, which will detail the various issues found in the models. It’s ultimately on them to fix the problems, and a report due to come out next February will include whether they did so.

“I wouldn’t necessarily take it on faith” that the companies will fix every problem that emerges, said Chowdhury, an AI ethics and auditing expert. “But we are creating an environment where it is a smart idea to be doing something about these harms.”

The skillset of a large language model “red teamer” is completely different from that of the traditional hacker set, which focuses on bugs and errors in code that can be exploited. A coding mindset can be helpful in figuring out how to trick these AI models into slipping up, but the best exploits are done through natural language.

“We’re trying something very wild and audacious, and we’re hopeful it works out,” Chowdhury said.

One thing the hackers won’t be testing for: partisan bias. While chatbots became a part of the culture wars this year, with some conservatives claiming they’re “woke,” Chowdhury said that’s largely the result of trust and safety mechanisms, not the models themselves.

“We’re not really wading into that water,” she said. “These models are not fundamentally politically anything.”

One of the big questions for large language models is whether the harmful content can be “watermarked” so that social media companies can easily identify and stamp it out. Right now, that looks like a huge challenge in text and a slightly less daunting one in AI-generated images and video.