Microsoft’s AI chief on the greatest game of catchup ever played

The Scene

Not long ago, Microsoft leapt ahead of competitors by funding OpenAI when it was a fledgling research lab. It locked down exclusive rights to its AI models and used them to turbocharge its cloud business.

That was then. Now, Microsoft finds itself playing catchup. As it tries to declare independence from OpenAI, it’s also hurriedly building out its own frontier AI models, custom accelerator chips and high-performing “harnesses” that compete with those made by Cursor, Anthropic, OpenAI and Google.

“We got here in six months, which is itself a remarkable achievement,” Mustafa Suleyman, executive vice president and CEO of Microsoft AI, told Semafor in an interview. “We’re now neck and neck with essentially what was state of the art just a few months ago.”

Suleyman spoke with Semafor at his division’s Silicon Valley offices in the leadup to this week’s Microsoft Build conference, where the company plans to release a family of updated AI models that, according to industry benchmarks, bring its homegrown offerings up to par with the leaders in the field.

“We are one of the largest tech companies in the world, and we have the resources to make sure that we do catch up,” Suleyman said, pushing back on critics who say Anthropic, Google and OpenAI are simply too far ahead to be caught at this point.

Suleyman, who helped start DeepMind in 2010 and joined Microsoft in 2024, said he’s seen companies rise and fall for years while progress in AI trudges on, regardless of which company happens to be on top in that moment.

“OpenAI didn’t even start until 2015. Didn’t really get traction until 2017.” Even Google,which eventually bought DeepMind, didn’t really start paying attention as an organization until around 2020, he said. “That’s a crazy thought.”

“I think the way to think about it is that there’s untapped markets that are totally unexposed at the moment,” Suleyman said. “You step outside to a different city and most people are just not using this in their everyday life.” He estimates the world is “less than 1% globally penetrated” on the use of coding models and generalist reasoning models.

In this article:

The Scene

Know More

Reed’s view

Room for Disagreement

Know More

For Microsoft, Suleyman says there are no shortcuts. He refuses to allow the company to “distill” from existing AI models. Distillation, the process of training smaller AI models on more sophisticated ones’ outputs, is how Chinese firms like DeepSeek have been able to create free, open-source versions of those made by the American frontier labs.

Microsoft is building its own custom computer chips to work efficiently with its own AI models. And, Suleyman says, it’s tuning its models to work well with its own GitHub Copilot harness, which it hopes to build into a competitor to Anthropic’s wildly popular Claude Code.

To do that, Suleyman says, it needs to see Copilot used in the wild. And that means getting Microsoft employees to use it.

Recently, Microsoft told employees in some divisions it would cut off access to Claude Code, pushing them to instead use Copilot, a decision based partly on the high cost of tokens and, in part, a desire to hone its own technology, according to people familiar with the matter. Although it’s hard to make a software product if you’re not using it in-house, the decision irked some Microsoft employees, these people said, who saw it as a cost cutting measure.

Cost-cutting, though, is one of Microsoft’s selling points. By building custom models meant to work with custom chips, Microsoft can be the lowest cost player on the frontier, Suleyman says.

To achieve that goal, Microsoft’s products have to all work together. “It’s very important that the model is tuned to the harness,” he said. “We want our models to be the best in the world at using VS Code [another Microsoft product] and GitHub. And we do that both by collecting trajectories of actual use and by making sure that it understands the tools that are available to it inside that harness at the most fundamental layer.”

Still, even if Microsoft leads every AI benchmark, it won’t matter unless customers actually use the products. “There’s a very direct correlation between real-world adoption and benchmark performance, at least for the top labs,” Suleyman said.

Already, AI has become a new translation layer, taking human language and converting it into code that can call APIs, use tools and increasingly perform tasks on a person’s behalf.

“So increasingly, the model is going to be your control layer and it’s just going to check in with you, and then it’s going to handle all of that software,” he said.

The way Suleyman sees it, Microsoft products like Windows and Office have always been a “translation layer” that is only necessary because humans and computers can’t communicate in the same language.

To speed progress at Microsoft, he said the company’s AI models have to use corporate work as a kind of game that can be won.

That idea goes back to his early career, when AlphaGo, the DeepMind system that beat Lee Sedol at Go in 2016 and showed the world what reinforcement learning could do in a contained environment with clear rewards. “AlphaGo is the design inspiration for everything we’re doing in RLEs here,” he said. “RLEs are just games. They’re game environments, simulated worlds.”

But Go, however complex, still had a very clear definition of “winning” and “losing” which made it a very good method for testing early AI models.

The next phase, he said, is to turn business processes into something a model can climb. “You have to turn a complex, messy, real-world domain, which is inherently non-verifiable, into something that has a score,” Suleyman said. “So we do both verifiable and non-verifiable rewards,” making models better at tasks that today’s AI models have trouble evaluating.

Reed’s view

Microsoft seems pretty open about how it lost the Copilot lead it should have held on to. The company definitely moved first, but the best AI products then came from everywhere else: ChatGPT for consumers, Claude for coding, Cursor for developers.

Still, that doesn’t mean Microsoft is out of the race. This is not a winner-take-all market and the more agentic AI becomes, the more opportunities exist.

Microsoft already owns so much of the surface area of work: Windows, Office, Teams, GitHub and Azure. Now it needs to figure out how to go much further than integrating a chatbot into Excel.

To truly be successful, Windows and Office need to fade into the background and language needs to seamlessly replace clicking and scrolling. Copilot needs to be at least as good as the world’s leading harnesses to attract developers — making the products ever better. The real test of Suleyman’s hill-climbing machine won’t be whether Microsoft surpasses industry benchmarks, but can it make the software people already use feel like the fastest way to get work done.

The effort requires an all-hands-on-deck effort, Suleyman told me, referencing his daily chats with CEO Satya Nadella in the early mornings and evenings. “He’s deeply in the weeds of every detail,” he said.

Room for Disagreement

The change will need to come fast if Microsoft is going to come from behind in the biggest and costliest industrial slugfest in human history. The company’s shares are down 16% year to date as investors worry its businesses are being eaten up by competitors and that Copilot’s user base leaves a lot to be desired. “I don’t think they’ve reached a penetration rate to satisfy Wall Street,” UBS analyst Karl Keirstead told Fortune.