Microsoft’s research division has added a major new capability to one of its smaller large language models, a big step that shows less expensive AI technology can have some of the same features as OpenAI’s massive GPT-4.
In an exclusive interview, Microsoft researchers shared that the model, Phi 1.5, is now “multimodal,” meaning it can view and interpret images. The new skill added only a negligible amount to the model’s already diminutive size, they said, offering a roadmap that could help democratize access to AI technology and help ease shortages in graphics processors used to run software like ChatGPT.
GPT-4, which powers ChatGPT, also recently became multimodal, but requires exponentially more energy and processing power. Phi 1.5 is open source, meaning anyone can run it for free.
“This is one of the big updates that OpenAI made to ChatGPT,” said Sebastien Bubeck, who leads the Machine Learning Foundations group at Microsoft Research. “When we saw that, there was the question: Is this a capability of only the most humongous models or could we do something like that with our tiny Phi 1.5? And, to our amazement, yes, we can do it.”
GPT-4 has about 1.7 trillion parameters, or software knobs and dials used to make predictions. More parameters means more calculations that must be made for each token (or set of letters) produced by the model. For comparison, Phi 1.5 has 1.3 billion parameters. If parameters were expressed in distance, GPT-4 would be the size of the Empire State building and Phi 1.5 would be a footlong sub sandwich.
The pursuit of small AI models endowed with the powers of much larger ones is more than just an academic exercise. While OpenAI’s GPT-4 and other massive foundation models are impressive, they are also expensive to run. “Sticker shock is definitely a possibility,” said Jed Dougherty, vice president of platform strategy for Dataiku, which services companies utilizing AI technology.
While individuals tend to ask ChatGPT to draft an email, companies often want it to ingest large amounts of corporate data in order to respond to a prompt. Those requests can be costly. The maximum price for a single GPT-4 prompt is $5 and other providers are in a similar range, Dougherty said. Typically, companies pay about $100 per 1,000 prompts. “When you apply LLMs to large datasets, or allow many people in parallel to run prompts … you’ll want to make sure you’re taking pricing into account,” he said.
Smaller models require fewer calculations to operate, so they require less powerful processors and less time to complete responses. At the same time, smaller and slightly less capable models can handle many of the tasks companies and individuals throw at them.
Less energy-hungry models have the added benefit of fewer greenhouse gas emissions and possible hallucinations.
“We are thinking about how do we build these systems responsibly and so they work well in the real world,” said Ece Kamar, who oversees human-centered AI at Microsoft Research. “All of the work we are doing on small models is giving us interesting puzzle pieces to be able to build that ecosystem.”
Researchers say these small models, as capable as they are, will never replace larger foundation models like GPT-4, which will always be ahead. Rather, the two models are complementary. For some applications, large models will be necessary. For very specific tasks, smaller models may be more economical.
Ahmed Awadallah, senior principal researcher at Microsoft Research, says the future might be small models and large models used simultaneously to handle tasks. “You could also imagine the small model being deployed in a different regime. And then maybe, when it doesn’t have enough confidence in acting, it can go back to the big model,” he said.
Microsoft researchers have also been experimenting with a way to use multiple small models together as “agents” that each handle a different aspect of a task. For instance, one model might write software while another model checks it for errors.
OpenAI’s GPT-4 was a major breakthrough in the field of AI, both in its scale and capability. And while there are certainly more milestones like that on the horizon, the most impactful advances in AI since then have been smaller, open-source models.
Meta’s Llama 2, for instance, is rapidly gaining in popularity. It doesn’t come close to beating GPT-4 in overall performance, but its small size and its ability to be customized makes it a good option for a large swath of people in the AI field. That’s especially relevant in the enterprise space, where costs and margins are crucial.
The ingenuity required to create these models doesn’t get enough attention. It could be years before the infrastructure needed to run the largest AI models catches up with the demand. The intersection of efficiency and capability is really going to determine the pace that AI changes and transforms every industry.
Small model capabilities also mean it will be hard to keep this technology contained inside the big tech companies. When the techniques like the ones used by Microsoft research start to spread throughout the AI research community, we’ll see a proliferation of small and highly capable models.
Room for Disagreement
In my interview with Future of Life Institute president Max Tegmark from September, the MIT professor turned AI critic warned that small AI models could be dangerous. “I think that makes everything even more scary. It’s like if people figure out how to make miniaturized nukes that fit in the suitcase, does that make you feel more safe?”
The View From Europe
Small AI models like Microsoft’s Phi 1.5 have another positive attribute when it comes to Europe: They tend to be open source. European countries — France in particular — view open source models as a way to gain ground on American and Chinese AI companies. That’s why the European parliament exempted open source models from proposed AI regulations that many critics say stifle innovation.
Still, Europe has a long way to go. Those exemptions don’t count if the models are used for commercial purposes.
- President Biden’s executive order on AI, signed earlier this week, requires companies training massive foundation models like GPT-4 to hand information over to the government. Microsoft’s research on Phi 1.5 is another example of how regulation of AI is so tricky. The EO ignores the opposite end of the spectrum, where capabilities of those gigantic models are added to small, open-source versions that can be transferred around the globe on a thumb drive. Here’s a good analysis of the EO and what it means.