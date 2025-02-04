Google DeepMind released the first version of its framework in May of last year. Since then, the AI landscape has changed.

For instance, most safety research a year ago focused on the capabilities of AI models during their initial creation, known as the pre-training phase. AI regulations such as California’s SB 1047 aimed to place limits on models that were pre-trained at a certain size.

But over the past six months or so, AI researchers have learned how to increase capabilities of AI models in the “inference” phase, when a model is actually being used. By running models numerous times to hone an answer, they can become exponentially more effective.

The DeepSeek R1 model, for instance, would have slipped under the radar of safety bills like SB 1047, which was vetoed by California Gov. Gavin Newsom, despite being extremely powerful. That is because so much of its capabilities come from inference, rather than the size of its initial training.

“What you’re seeing with these new test time and inference models is a different type of capability that’s emerging,” Lue said. “That, plus the fact that we now are going to be seeing the emergence of agents, increasing tool use and ability to delegate more activities, means the suite of responsibility and safety evaluations and mitigations, of course, has to evolve.”

Helen King, DeepMind’s senior director of responsibility, said the evolving AI landscape offers some good news on the safety front.

New “reasoning” models like OpenAI’s o1 and o3 models and DeepSeek’s R1 model may provide more insight into how models are operating. “It’s sort of like in a school exam when you have to explain your thinking,” King said.