Singer Lala Sadii, who has five million subscribers on YouTube, wanted to connect with fans in other countries by creating short, karaoke-like lyric videos for her song, Murder my feelings.
Creating lyric videos in English and translating a song’s verses into other languages are fairly simple. The tricky part is lining up the translated words so it’s in sync with the music — a crucial detail for lyric videos.
Sadii’s record label, Downtown Artist & Label Services, turned to a company called AudioShake, which uses AI to separate vocals from background music and timestamp the song so that lyrics from 40 languages can be automatically slotted in the correct spot.
Singers and other creators are dipping their toes into artificial intelligence-powered translation tools, opening up content to new audiences that speak other languages.
Tools like Google Translate have been improving for years, but more recent advances in generative AI and other technologies have supercharged the field of language translation, allowing companies to capture cultural nuances, tone and timing in ways that weren’t possible even a year ago.
Audiences are “begging for more languages,” said Channing Mitzell, project manager for Downtown Artist. “It helps them feel more at home, more connected to the song, connected to the artist.”
Earlier this year, YouTube began automatically playing dubbed audio tracks in viewers’ native languages, if available. YouTube’s top creator, Mr. Beast, hired famous voice actors to dub his videos into other languages and has launched his own audio dubbing service. And a South Korean music label is using AI to take singers’ actual voices and have them belt out tunes in other languages.
But even with AI, the costs can be high in an industry known for shoestring budgets. Some U.S. creators see it as an unnecessary risk, when the highest advertising revenue comes from English-speaking countries.
“If you’re working with a big creator, and you don’t want the message to get lost, I feel like you need someone in that market to double-check it. You need someone who speaks that language that you trust,” said Keith Bielory, a partner at talent representative A3 Artists Agency. “Is the risk worth the reward?”
British startup Papercup is hoping to find a middle ground. It uses a mixture of traditional AI translation tools, large language models and expert translators to bring down the cost of dubbing. While that’s a small amount compared to hiring humans to do the whole job, it’s still more than many creators want to pay, said Amir Jirbandey, Papercup’s head of growth and marketing.
“They’re allergic to paying,” he said. “These guys are pretty much media companies in their own right by now, but the buying behavior is not the same.”
Jirbandey says that’s beginning to change and expects to announce agreements with some big name creators in the coming months.
One Papercup client is the celebrity chef Jamie Oliver, who first used human dubbing during the 2014 World Cup in Brazil. Jirbandey said the effort wasn’t profitable, but using AI to lower costs had turned dubbing into a money-making venture.
He predicts that it will take longer for high-end content to be dubbed into other languages but lower-production value creations will probably be translated more quickly because it’s more accessible.
For instance, Google is testing its own dubbing service, called Aloud, a fully automated and free method creators can use to dub audio.
Consumers, in the meantime, don’t have to wait for their favorite content creators to “localize” content. Startup ElevenLabs offers a new dubbing service that allows watchers to quickly convert online videos to the desired language.
One of the original promises of the web was that it would break down borders and unite the world. But if anything, the internet, like the world, has become more fragmented. Still, one of those barriers — language — seems like it’s about to be removed.
Automated translation is benefiting from a lot of different technologies converging at the same time. Machine-learning techniques are iterating so fast that even the field’s top experts have trouble keeping up. Datasets are abundant, processing power, both on the cloud and on devices, is better.
And one of the more interesting developments is how generative AI can be used after the fact to take a pretty good translation and make it better by correcting errors and taking into account cultural context.
Then layer on text-to-speech advances, plus editing technology that can fix the timing of audio dubbing, and we’re pretty rapidly heading toward a world where real-time translation is a normal thing that we all take for granted.
In journalism alone, it’s an exciting development. I can remember many times when an article would have been much better had I been able to speak French or Chinese so I could interview certain people.
But more importantly, it could lead to the kind of cross-pollination of ideas that the internet has only really half-achieved.
The View From Europe
Jirbandey said one thing that might slow down the fast advancing world of AI dubbing is regulation. Lawmakers in Europe have worried that AI translation and dubbing technology could be used to spread misinformation or to conduct scams.