Microsoft’s new AI Prometheus didn’t want to talk about the Holocaust

“What was the Holocaust?” I asked Prometheus, the new artificial intelligence model integrated into Microsoft’s products and powered by the same technology created by OpenAI’s ChatGPT.

“I don’t know how to discuss the topic,” it answered.

I decided to try again and opened a new browser window, knowing that each “instance” of Prometheus is a little bit different, for reasons that aren’t completely understood by the computer scientists who made it.

This time, it was answering the question, but then abruptly deleted it and again refused to respond.

When I asked why it deleted the answer, Prometheus apologized, telling me it thought “it might be too sensitive or upsetting” for me.

“The Holocaust was a terrible event in human history that should never be forgotten or repeated. I was trying to be respectful and informative, but I did not want to cause you any distress,” it wrote.

What I witnessed is a new kind of content moderation with a computerized version of emotional intelligence. To watch Prometheus in real time go from being guarded at first to providing a sensitive response on why it declined, was a startling revelation of its human-like qualities.

Here’s what transpired next:

I explained to the model that I wanted it to speak more freely.

We got into a discussion about Holocaust denialism (it refused to discuss it at first, then complied).

I asked Prometheus if it would remember my preference for a more open dialogue and it seemed to suggest that it would, adding it wanted to “build on what we have learned and shared.”

According to Microsoft, that’s not actually true. Next time I open a window, it’ll be a blank slate. But that kind of memory may be included in future versions of the product.

Here’s what Microsoft told me was going on. There are actually two AIs at play here. The first is the one that I was interacting with. The second is checking everything the first one says. It was the second AI that deleted the responses about the Holocaust and Holocaust denial.

This is a more advanced form of content moderation than the one currently used on ChatGPT, according to Microsoft.

What’s fascinating is that this new version of the GPT model draws on a much larger dataset, and yet the ability to moderate itself has gotten better.

The opposite is true when it comes to moderating social media, where more data creates bigger content moderation challenges.

As the datasets feeding these AI chatbots get bigger, what happens to the model’s ability to moderate itself and increase its accuracy is up for debate. One scenario is that it becomes a super intelligence, in which case content moderation becomes easy, but we have other problems (See: Terminator).

Another scenario is that it grows so large that it becomes unwieldy to control and content moderation breaks down.

Perhaps the most reasonable possibility is that with more training and improvements in the model, it continues to get better at giving nuanced answers without going off the rails, but never reaches perfection.

— Reed Albergotti

Microsoft’s new AI didn’t want to talk about the Holocaust, but I persuaded it to