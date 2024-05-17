Preparing for a time when artificial intelligence is so powerful that it can pose a serious, immediate threat to people, Google DeepMind on Friday released a framework for peering inside AI models to determine if they’re approaching dangerous capabilities.

The paper released Friday describes a process in which DeepMind’s models will be reevaluated every time the compute power used to train the model increases six-fold, or is fine-tuned for three months. In the time between evaluations, it says it will design early warning evaluations.

DeepMind will work with other companies, academia and lawmakers to improve the framework, according to a statement shared exclusively with Semafor. It plans to start implementing its auditing tools by 2025.

AD

Today, evaluating powerful, frontier AI models is more of an ad hoc process that is constantly evolving as researchers develop new techniques. “Red teams” spend weeks or months testing them by trying out different prompts that might bypass safeguards. Then companies implement various techniques, from reinforcement learning to special prompts to corral the models into compliance.

That approach works for models today because they aren’t powerful enough to pose much of a threat, but researchers believe a more robust process is needed as models gain capabilities. As that changes, critics worry that by the time people realize the technology has gone too far, it’ll be too late.

The Frontier Safety Framework released by DeepMind looks to address that issue. It’s one of several methods announced by major tech companies, including Meta, OpenAI, and Microsoft, to mitigate concerns about AI.

AD

“Even though these risks are beyond the reach of present-day models, we hope that implementing and improving the framework will help us prepare to address them,” the company said.