• D.C.
  • BXL
  • Lagos
  • Riyadh
  • Beijing
  • SG
  • D.C.
  • BXL
  • Lagos
Semafor Logo
  • Riyadh
  • Beijing
  • SG


New OpenAI update brings advanced voice features to any app

Updated Oct 1, 2024, 1:22pm EDT
tech
Courtesy OpenAI
PostEmailWhatsapp
Title icon

The News

SAN FRANCISCO — OpenAI is giving outside developers access to its most advanced voice feature, currently exclusive to ChatGPT, allowing any app to incorporate human-like AI assistants that can make phone calls and use natural conversation to navigate complex software.

The news, announced Tuesday at the company’s second annual developer day, came along with a suite of new products for software developers, giving third party apps greater and easier access to the company’s suite of AI tools.

The “Real Time API,” which is known as Advanced Voice Mode in ChatGPT, allows users to have more fluid and lifelike verbal conversations with AI models. While third party app developers have been offering OpenAI models with a voice interface for well over a year now, the feat required combining multiple layers of software to convert voice to text and back again, making the experience often slow and glitchy.

AD

OpenAI’s new API allows developers to easily incorporate voice. The service offers lower latency, meaning there is less delay between the time a person asks the chat bot a question and when they receive an answer. It also allows user to interrupt the chat bot mid sentence to steer the conversation.

At a press event Monday, OpenAI demonstrated for reporters how outside developers might take advantage of the new Real Time API with a fake travel website called Wanderlust.

Romain Huet, head of developer experience for OpenAI, spoke to Wanderlust’s virtual travel agent for a trip to San Francisco.

AD

The most stunning part of the demonstration came when Huet asked the agent to call a fictional chocolate shop in San Francisco and place an order.

Huet’s phone rang and he pretended to be an employee of the chocolate shop taking the order from the AI agent.

Giving the AI model the ability to place phone calls was as simple as connecting it to Twilio, a cloud-based communications firm.

AD

OpenAI said it had plans to use safety measures and monitoring to prevent abuse of the Real Time API.

Title icon

Know More

At OpenAI’s first developer conference, held last November, the company invited widespread press coverage and made major announcements that came in a 45-minute keynote address with a cameo from Microsoft CEO Satya Nadella.

The event followed the strategy of other major tech firms, such as Apple, that use developer conferences to make major product announcements aimed not just at the developer community, but the wider consumer market.

This year, the company said it would host a quieter event that would be closed off from the press.

Olivier Godement, who leads platform products for OpenAI, said Monday that the company would stop releasing new models at the events and let the “models follow their own research and safety timeline.”

The change comes amid criticism that the company is developing its technology too quickly.

While OpenAI began as a nonprofit, it is in the midst of a restructuring that will likely take control away from a non-profit entity and turn it into a traditional startup — a move designed to help it raise funds and recruit and retain talent.

The Wall Street Journal reported that the changes are “tearing the company apart” and attributed the exit of key executives, such as chief technology officer Mira Murati and chief Scientist Ilya Sutskever, to the growth.

OpenAI said it did not agree with that characterization.

Title icon

Step Back

At Monday’s press conference, OpenAI signaled that it was unphased by executive departures and the relentless media coverage.

“[Former chief research officer Bob McGrew] and Mira have been awesome leaders. I’ve learned a lot from them. They are a huge part of getting us to where we are today. And also, we’re not going to slow down,” said Kevin Weil, who ran product at Twitter, Instagram and Facebook before joining OpenAI as its chief product officer in June.

OpenAI also unveiled other tools that offered new ways for outside developers to reduce the cost of using OpenAI models. Weil said OpenAI has managed to reduce the cost of running an AI model by 99% in 2 years, while making them faster and more capable. OpenAI said it expects that trend to continue.

OpenAI also announced Tuesday that developers can further cut costs by “caching,” or essentially keeping track of repetitive prompts, so that the AI model does not have to repeat its work, racking up data center charges.

They will also be allowed to create “distilled” versions of OpenAI models. In this method, developers can use a large OpenAI model to “teach” a much smaller model. This method can create versions that are customized for a specific needs while running faster and much cheaper than their larger counterparts.

OpenAI also announced the ability to “fine tune” models to better recognize specific images — an ability that could prove useful in autonomous driving or medical imaging.

Title icon

Reed’s view

OpenAI is in a race to build artificial general intelligence. But it’s also racing to get its products into the hands of developers today, generating revenue, gaining market share and creating an ecosystem that keeps developers happy and, if all goes well, unlikely to leave.

You might wonder why. OpenAI’s biggest investor, Microsoft, is also racing to bring OpenAI’s models to market through its Azure ecosystem, Microsoft Copilot and Github. Why not simply focus on building more capable AI and let Microsoft do the rest?

After watching OpenAI demonstrate how its chat bot can now make a phone call — to anyone — and place an order autonomously, I was reminded of at least one reason.

OpenAI is willing to push the envelope a little bit more than Microsoft, which seems to be treading cautiously in the rollout of AI in its products. I can’t see Microsoft giving that Twilio demo — not for some time, anyway.

I talk to a lot of startups that use OpenAI APIs directly, instead of going to Microsoft to use them. They might use other models, too, but there seems to be a certain type of customer that gravitates toward OpenAI but might not be an Azure customer.

There are a lot of benefits to creating a developer ecosystem for OpenAI. The company can learn from the way people are using models, and those lessons can help to build new capabilities.

And the more OpenAI lowers costs and offers customization, the more attractive it looks next to open source AI models from Meta, Mistral AI and others.

Title icon

Room for Disagreement

The Wall Street Journal said OpenAI’s veteran researchers believe the company’s “culture has been corrupted” and that it’s having a hard time innovating. It reported:

Continued growth will depend on maintaining its technological edge. The company’s next foundational model GPT-5 — expected to be a major leap in its development — has faced setbacks and delays. Meanwhile, rival companies have launched AI models roughly on par with what OpenAI is offering. Two of them, Anthropic and Elon Musk’s xAI, were started by former OpenAI leaders.

The intensifying competition has frustrated researchers who valued working at OpenAI because it was the perceived leader in the space.


AD