Usually, when Microsoft unveils its new laptops, the focus is on the hardware. Last week, it was all about the software, and more specifically, new AI models injected into Windows and Office.
What we’re seeing is the beginning of a transition at Microsoft. From a user stand point, that involves moving from mouse clicks and keystrokes to, one day, almost purely language or voice interactions.
Because the interface, called Copilot, is based on a large language model (the same one ChatGPT uses), it means even Microsoft isn’t quite sure exactly what it’s capable of.
I spoke to Yusuf Mehdi, Microsoft’s consumer chief marketing officer, about these changes, where AI is headed and his expanding role at the company as he takes on some of the responsibilities of Panos Panay, who recently left Microsoft to take over Amazon’s device business.
The View From Yusuf Mehdi
Q: This is really different from how software’s usually created, where every single little feature is something that you’ve coded and shipped? Does that make you think about product development differently?
A: It’s a lot like search. You have this great search engine, but what do people search for? What are the answers it’s going to give me, and they’re not always the same. So the way you develop it is you write much more for general use cases. But then over time, there’s more of an interest to get it personalized. And there’s a little bit of tension back and forth. Because what we found in search is that personalization doesn’t work. There’s a reason Google doesn’t have personalized search results. Because doing general searches for everyone and building on the relevancy is better than if you try and tune it for individuals.
Yet for some things, like Microsoft 365, it knows your identity, it knows your boss, it knows your team. That way, when it summarizes a meeting it says, ‘your boss tasked you with an action item.’ So sometimes we will build for general things and sometimes we’ll build for an actual experience. We know this data, and we can give an answer.
Q: GPT-5 will be more multimodal with audio and video. What’s your vision for how multimodal will progress?
A: We’re doing a lot of it already today. I had this scenario with my wife the other day. She sent me a picture of a piece of art. She was like, ‘hey, maybe we’ll buy it for birthdays or something. You remember it, right?’ And I wasn’t sure. So I took the picture and loaded it into Bing Chat. It came back and told me, this is the artist, this is the title, here’s a bit about the story behind this piece. So then I was able to go back and say, ‘yeah, I remember that.’ That was a cool multimodal. It’s extra compute power. So it costs more money to serve those GPT queries. And then there’s a lot of work to ground it with the search engine, so that we look at the image and the text. You’ll see more and more people using it to walk around the world and learn things.
Q: You have this one product as far as consumers are concerned: Copilot. But behind the scenes, is it a giant foundation model that’s getting fine tuned and better? Or are you finding more models — some open source, some not — that you’re putting together to create this product?
A: Right now, we’re using GPT-4, GPT-V. And then we’re adding on top of it our own secret sauce, if you will, that we call our Prometheus model. That’s our search algorithms and relevancy ranker on top of that LLM. And then we have an orchestration process. Before we feed the query into that LLM, we disambiguate it, we learn a little bit about it, we use the AI to make the query better, then we feed it into the model, we feed the search engine and back comes results. We do some more manipulations and that’s how we’re getting better answers than what you have out there today.
Q: What is the cost of all this?
A: As we scale up Copilot, and essentially, our engineering and AI capabilities, there’s absolutely cost with that. Our hope is that we put it into Windows and a lot of usage comes out of that. If it drives more Bing searches, then it pays for itself. About 70% of what people are doing are search equivalents and then there’s 30% of things you can’t do in search today.
Q: Does that include the people who’ve been the beta testers of 365?
A: No, that’s the general use case. A lot of what we see with enterprise accounts is more teams, meeting-related summaries of what’s going on. And then with 365 Chat, individuals saying, ‘tell me what my hot items are for this week, give me a summary of what my boss has been asking me, get me ready for a meeting.’ Those are less costly because we don’t have to query the internet.
Q: What’s the competitive landscape? How do you stay ahead?
A: I was surprised not to hear more on AI from some of the other big tech companies. I’m sure they will. This is the next wave of computing. Everybody’s got to be all in on it. But today, I feel really good about our competitive position. We have the most advanced AI model with GPT-4 and GPT-V. At some point, Google, in particular, will update their model. But it’s been a while. Then we have Copilot. It’s the user experience. What was the magic of ChatGPT? GPT-3, then 4, and a chat interface. This Copilot is basically that times times. It is a massive user interface advancement.
The third thing is we have context that, with hallucinations and challenges, context is what really sets your AI apart. We have the context of the web. There’s only one other company that understands the web like we do. We have the context of your work data. Nobody understands that like we do. And then you have the context of what you’re doing on your Windows PC. Nobody really has the scale that we have on that.
The last thing I’d say is we have real products in market. We have GitHub, we have Bing and Edge. We’ve been out now for months learning so much. This is really a game about customer feedback, iterate, customer feedback, iterate, ship, iterate. We haven’t always been there on new waves in the consumer space. It feels great to be there on this.
Copilot on Windows is going to make everyone a power user. People use only 10% of the features of Windows. So now you’ll be able to just ask for what you want. Put it in dark mode. Set up my printer.
Q: How do you forecast the capabilities of these models so you can plan your products down the line?
A: We have a tight relationship with OpenAI. And the model isn’t the only thing. There’s actually a lot of things on top. At our February event, we announced our Prometheus model, which is our proprietary way of working with an LLM. You get better results if you get better prompts. So when someone does a search, rather than just putting that in there, we run AI on it, we compare it against the big index, because we do a lot of query disambiguation to say, ‘what’s a smart way to put that into the LLM?’ The LLM runs it, it comes back and says, ‘I think this is the answer.’ We check that against the web. And we’re like, ‘no, this is probably not going to be a great answer.’
So we’re doing at multiple levels of the technology stack, grounding, training, even before you do anything special in the model. We can spend quite a long time innovating, writing code, and shipping things to make it better even before we get to one of the next models.
Q: Is energy use going to be an issue?
A: There’s innovation happening at all ends of the spectrum on this. So there’s innovation on the model itself, more parameters, larger models. But then there’s also a lot of innovation on reducing the cost, reducing the heat factor. Both things are running in parallel. The next big breakthrough might not be the next version of GPT. It might be halving the cost.
Q: There is kind of an art to prompting. How does that change consumer behavior and the language that we use?
A: We’re having to retrain our brains because we were taught to dumb down our searches. Fewer keywords give you better search results. The average today is 2.6 keywords, that’s the average in 10, 20 billion queries every single day. In AI, you’re so much better off saying, ‘Give me a picture of a sunset at the edge of the equator, in autumn, where the moonlight is reflecting off and the sun is halfway down. And I want to see lots of blues and oranges.’ You can’t put any of that in the search engine. But that makes your AI prompt better. So we’re having to retrain our brain to feel comfortable asking for more specific things.
Q: I wonder if the younger generation will be better at that?
A: My daughter is a budding artist. She had to write her artist statement for her class at school. She asked, ‘Dad, do you think I can use an AI?’ She’s of that generation that is very good with technology. But she’s an artist. She’s not a techie. She came back and said, ‘I got something great. But it took me a long time to get the prompts right.’ She couldn’t believe how much work she had to do, telling it what she’s focusing on, what her passion is, what she wanted it to do. It forced her to think before writing. What am I really about? Because she’s more of an artist, she’s not great at writing these long docs about herself. But she was great about putting her ideas down.
There’ll be a class of people, maybe it’ll be a younger generation, who will understand these prompts. What points do I want to make? That’s why it’s not surprising that the people who are really using them today are writers, artists, coders, because it’s people who know how to create.
Q: This idea that it’s like cheating in school. We’re sort of coming around to the idea that it’s not.
A: It’s like calculators in the classroom. Oh, this is cheating. Kids aren’t going to learn math. Calculators are banned from school. And then what happened? We got through it because people figured out that, first, you need to know the basics of math to use a calculator. And then there are times where you might say, ‘Put the calculator away. I’m giving you a test.’ This is just a tool. It’s not a replacement. And knowing how to use the tool will help you get to a better place.
Q: Are you taking on an expanded role?
A: [Former Chief Product Officer] Panos was an amazing colleague. We’re brothers in arms on building consumer products. I’m going to miss him a ton. I’m playing some of his role. He helped compose the vision of the consumer experience. I’m going to play more of that role now. Also, the business of Windows and devices, working with OEM partners and retailers. Has the whole business come together? I’m going to take on that role.
Q: What’s the personal stamp you want to put on it?
A: In this age of AI, we have an opportunity to reimagine all of our software, our services, and our devices, to better empower people to be smarter, more productive, more creative. I’ve worked in all parts of the company so I’ve been around the block. I know what it’s like to launch the operating system that became the mainstream thing for the world as part of a big team. And now we’re kind of redoing that same thing. That goes back to what Satya said. We’re going to try to make AI mainstream and marry it with our most beloved products.