With the US government’s help, Anthropic built a tool designed to prevent its AI models from being used to make nuclear weapons.
The company announced Thursday that it had worked with the National Nuclear Security Administration over the past year to build a “classifier” that can block “concerning” conversations — like those about building nuclear reactors — on its systems.
“As AI models become more capable, we need to keep a close eye on whether they can provide users with dangerous technical knowledge in ways that could threaten national security,” Anthropic wrote of the development, which grew out of “red teaming” exercises with the Energy Department that began in 2024.
The company already deployed the tool on its Claude models and plans to share its approach with a broader AI industry body, the Frontier Model Forum, to help other companies build similar tools.