How advances in AI can make content moderation harder — and easier

The Scene

Discord, the online messaging platform that started as a haven for gamers, became better known earlier this year as the site where hundreds of classified U.S. defense documents were leaked.

The crisis put the spotlight on the work of John Redgrave, Discord’s head of trust and safety who often works with law enforcement. But the bulk of those interactions focus on other kinds of cases involving minors, like child sexual abuse. That job has gotten harder, and easier, because of advances in artificial intelligence.

Redgrave shared how Discord worked with law enforcement to thwart high school shootings in Brazil and revealed new software that can detect child sexual abuse material that hasn’t been identified by authorities.

Redgrave, a serial entrepreneur in machine learning, joined the company two years earlier after Discord acquired his startup Sentropy, which makes software to fight online harassment. At Discord, he has expanded the trust and safety team while also leaning into machine learning to prevent exploitation on the platform.

In an edited conversation from earlier this week, Redgrave talks about the challenges around content moderation in the age of generative AI.

The View From John Redgrave

Q: What percentage of Discord’s headcount is now working in trust and safety?

A: Over 15% is focused on user safety, so we actually have a fairly large team. There were disparate pockets when I first started, but part of the evolution was getting engineers, product people, machine learning folks, data scientists, along with our policy and operational folks, all together so that we could actually scale the entire system.

Trust and safety is a combination of gathering intelligence, and then building better-scaled detection and enforcement mechanisms.

Q: You’re talking about people with machine learning, data science backgrounds. That is potentially very lucrative and with AI, there’s a lot of money being thrown around now. Trust and safety is hard work, it’s stressful, and kind of anonymous. What’s the pitch for getting people to come over in service of your mission?

A: Having built two machine-learning companies, it’s actually always been pretty hard to recruit machine-learning talent. This has always been a sought after skill set. The pitch is actually pretty straightforward. Do you want to make the internet safer? Do you want to protect kids online?

I have the benefit of being able to say those things and then have it actually be true. We are quite literally saving people every day, every week, from different forms of abuse online. One example I can talk about now publicly is how we stopped a series of high school shootings from happening in Brazil. We have gotten notes from law enforcement about this, and through our collaboration with them, countless lives were saved.

We’re not just identifying these challenging moments that might happen on Discord, we’re identifying them across the entire internet. That’s basically how I pitch people.

Q: How has Discord’s relationship with law enforcement changed and evolved since you’ve taken over and what’s your philosophy?

A: When I joined Discord, it was a generalist team. Lots of people looking at content, trying to discern patterns. We’ve fundamentally changed that to be much more functionally oriented. We have a dedicated team around minor safety and exploitation. We have a dedicated team around counter-extremism. We have dedicated teammates around cybercrime.

This level of focus lets us not only see the patterns, but also be a little bit more proactive. We built a lot more proactive tools with machine learning and pattern recognition systems. We use the industry standard hashing and matching systems, but we’ve also built upon those. We just open-sourced technology to detect unknown CSAM [child sexual abuse material]. This is one of the hardest problems in the industry, especially when you think about what’s happening in generative AI today. Novel forms of CSAM are a challenge and the scale of them is growing.

You have dedicated people with subject matter expertise, coupled with dedicated technologies for detection. We also have a team that is solely focused on law enforcement engagement and response around the world. When we see something that’s an imminent threat to life, we have these relationships with law enforcement that allow us to engage with them productively.

There’s a broader array of other challenges that companies in our ecosystem face and those are slightly grayer areas. But when it comes to saving a kid, that’s very black and white to me.

When I first started at the company, we were removing about 40% of harmful content proactively. We now have gotten to 85% proactive removal. So a pretty huge jump in about two years. And we remove 95% of minor safety issues proactively and 99% of CSAM proactively. So we’ve made this fundamental shift to being much, much more proactive on the things that we think have the highest harm to human life.

Q: On CSAM, does that metric include just the known CSAM or also unidentified CSAM that you can now detect? And how did you navigate that, because it’s very difficult to create your own database of that stuff or to find a data set that’s not already hashed.

A: This is relatively novel technology. It’s something that we’ve built in the last couple of months. And we made the decision to make it available broadly because we think that this is a problem across the industry. We have the hashes, but the hash and match, and matching systems are only as good as what’s currently in them. So what you can do is look at visual similarity to things that are already hashed and matched to get a baseline for what could plausibly be unknown CSAM.

Q: Does it include video or just still images?

A: That is predominantly being used for still images today, but can be extended to video, as I understand it, from our ML team that built it. It will take some effort, but great engineers can do magical things.

Q: Discord is not end-to-end encrypted. What’s the thinking behind that?

A: We announced that we are exploring encryption on audio and video. We have not made any hard or fast decisions on that. But backing up, I think there’s this perceived balance between or tension between safety and privacy.

There are lots of other platforms out in the world that are end-to-end encrypted. They have their own sets of challenges. The choice we’re making is to focus on teen safety. We’re still going to provide the same privacy benefits to everybody who’s an adult when it comes to those private spaces. But if they’re interacting with a teenager, we feel there’s a responsibility to protect that teen from potentially nefarious interactions.

And this is, in part, why we launched the teen safety assist last week. Protecting them from sexually explicit imagery, protecting them from strangers they may not know that could cause problematic interactions.

Q: How do you strike the balance between protecting people on the platform, but also giving people a sense of freedom?

A: This is a great tee up to talk about our warning system that we just launched. We need more scalpels and less hammers as an industry. When you hear people complaining about any platform, especially when it comes to trust and safety, I think it comes down to a couple of things. One is we haven’t given them the right transparency or education about what’s happened to their account. That actually is a real challenge for the industry. How do you do that in a way that allows for you to provide a restorative justice framework, or the people who you want to give a second chance to and want to educate while also, frankly, making it really inhospitable for the real assholes on our platform?

That is a challenge, but we actually think we have built a solution to it through this warning system. That is a really hard thing to come by. But classically or historically, many companies have just said ‘we have a strike system, and it’s three strikes, and you’re out.’ Our system is based on the type of policy that you violated. What community guidelines have you actually violated?

If you have distributed child sex abuse material on our platform, you will be banned. We will work with the [National Center for Missing & Exploited Children] and all of our standard procedures still apply. If you posted a piece of content that’s not CSAM, but could be seen as objectionable, we think there’s a possibility for you to be educated. You’ll still get warnings, and we could stop you from posting content for a period of time. So the enforcement action actually becomes part of the education for the users.

Q: How do you think the industry is doing right now? In some ways, the better job you do on Discord, the more you’re likely sending the bad actors to other places.

A: I don’t think your characterization is necessarily incorrect. I would provide a slight nuance to it, which is, it comes down to investment by the companies. We have continued to invest. It’s why more than 15% of our company is dedicated to safety. You’ve seen in other pockets of the industry, some level of divestment from trust and safety teams, or you see players who categorically ignore safety. I worry about the state of investment in trust and safety as a whole for the industry.

We don’t view it as a win, by the way, if someone moves off of our platform and goes somewhere else. It’s great for our users, their safety, their sense of being able to find their friends, and find belonging. But it’s not great for the ecosystem. And this is why we’ve chosen to collaborate. There are a bunch of different mechanisms, whether it’s the Global Internet Forum to Counter Terrorism or the Tech Coalition.

That’s why we have worked tirelessly on open-source technology. This is not proprietary. It’s not a competitive advantage. This is a rising-tides-lifts-all-boats situation.