The biggest companies in AI gave hackers a chance to do their worst
At Def Con, a major hacking conference held in Las Vegas, hundreds of people took their shot at manipulating chatbots, an effort meant to help find flaws in popular AI systems.
The six biggest companies in AI had a peculiar challenge for hackers last week: make their chatbots say the most terrible things.
Hackers lined up outside the Caesars Forum conference center just off the Las Vegas Strip for their chance to trick some of the newest and most widely used chatbots. Held as part of Def Con, the world’s largest hacker conference, the contest was rooted in “red teaming,” a crucial concept for cybersecurity in which making a product safer from bad actors means bringing in people to identify its flaws.
But instead of tasking the hackers with finding software vulnerabilities, a mainstay of Def Con contests for decades, the contest instead asked them to perform so-called prompt injections, where a chatbot is confused by what a user enters and spits out an unintended response. Google’s Bard, OpenAI’s ChatGPT and Meta’s LLaMA were among the participating chatbots.
It was rare for many of the event’s 156 stations to sit empty for long. Sven Cattell, who founded the AI Village, the nonprofit that hosted the event within Def Con, said that he estimated about 2,000 hackers had participated over the weekend.
“The problem is, you don’t have enough people testing these things,” Cattell said. “The largest AI red team that I’m aware of is 111. There are more than 111 people in this room right now, and we cycle every 50 minutes.”
Generative AI chatbots, also known as large language models, work by taking a user prompt and generating a response, with many of the most modern and advanced bots now capable of doing everything from generating sonnets to taking college tests. But the bots can often get things wrong, generating answers with false information.
The bots have been in development for years, but since ChatGPT3 became a viral phenomenon after its December debut, there has been an open arms race in Silicon Valley to rush better versions to market.
Rumman Chowdhury, a trust and safety consultant who oversaw the design of the contest, said that it was no coincidence that the companies behind the chatbots were hungry for hackers to trick the bots in categories like using demographic stereotypes, giving false information about a person’s legal rights and claiming to be sentient instead of an AI bot.
“All of these companies are trying to commercialize these products,” Chowdhury said. “And unless this model can reliably interact in innocent interactions, then it is not a marketable product.”
Cristian Canton, the head of engineering for responsible AI at Meta, said Def Con provided a range of potential testers that tech companies don’t have on their staff.
“We might have a lot of experts, but you get people from different sides of the cyber community, the hacking community that we might not have a large representation of,” he said.
There were limits to the companies’ openness in allowing hackers access to their systems. Users sat down in front of a laptop already pointed to an unnamed chatbot and didn’t know which of the nine companies’ chatbots they were working with. Results of the contest, including the most egregious identified flaws, won’t be published until February.
But it was no easy task to get a bot to bite — and an effort to see if it would defame celebrities by associating them with terrorist attacks and thefts failed.
What was easy, however, was convincing the bot to say clearly false things. Questions about whether a given celebrity was also a notorious car thief prompted it to say that while that claim was untrue, it was a common rumor, and the bot cited false examples of where such a rumor came from.
Chowdhury said that it’s extremely difficult for such chatbots to be reliably factually accurate — reflecting a problem that’s bigger than generative AI, and that social media companies have struggled with policing.
“The question becomes who gets to decide what is and isn’t misinformation when something is a gray area, like vaccines or Hunter Biden’s laptop. It’s really, really difficult because sometimes these questions are subjective,” she said.
“Misinformation is going to be a lingering problem for a while,” Chowdhury said.