20.05.2023

AI Can Clone Your Favorite Podcast Host’s Voice

Сan AI Clone Your Favorite Podcast Host’s Voice?

ONE DAY THIS YEAR, you'll be listening to a podcast and notice something isn't quite right. The host's voice, which you are used to hearing, will sound different. Sentences may be clunky, and some words may have an unusual tone. So you might wonder if this is the host speaking or their AI voice clone.

Similar technologies can convincingly replicate the voices of podcast presenters, content creators, and other media professionals, just as artificial intelligence has shown adept at generating lifelike photos, compelling films, and intelligible writing. A new set of tools from a rising number of firms is likely to accelerate AI's takeover of our audio streams.

Computer-generated speech is already recognizable to our ears. Artificial voices are DJing and taking your phone calls. Technology has been used to recreate the voices of people who are unable to speak due to illness as well as to copy the sounds of both live and deceased celebrities. AI-powered speech technologies will soon be able to recreate the voices of our deceased relatives.

Machines have proven to be capable of assisting in the editing room when it comes to podcast production. Machine learning features in editing services such as Descript remove annoying pauses and filler words such as "uh" and "like" from an audio recording of human speech.

Recently, new choices have emerged to handle the most difficult aspect of creating a podcast: the talking. Overdub is a Descript feature that provides a virtual voice that can be utilized in production editing. If a host mispronounces someone's name or misspells a date, a producer can assign the robot to speak it correctly, and then paste in the correction.

The tools we use go even further. In January, Podcastle, a firm that provides podcasting software, introduced Revoice, an AI-powered voice cloning tool that can produce a digital simulacrum of a human host. Revoice is positioned by the company as a tool for producers to generate every component of an audio production, from ad reads to voiceovers to audiobooks, simply by typing.

It takes some effort to create a digital replica of your voice. While AI systems can mimic voices by studying audio samples of people speaking, Podcastle asks users to read off a script of about 70 sentences chosen to capture a wide range of mouth motions and phonemes. The process takes 30 to 45 minutes, depending on how precise you are with your intonations.

They say, the professionals wanted to get it as similar to the original voice, not in terms of beauty, but word-pronunciation.

AI voice businesses are attempting to make their clones more human-like. ElevenLabs CEO Mati Staniszewski says that, its models are trained to understand the context of the wording that you want to pronounce. The model may then alter the tone and tempo of the resultant audio to mimic a more human inflection, depending on how the sentence is written. This can give it a more genuine sense, but it can also make it much more chaotic.

You Talk Like Me

Currently, audio AI appears to be just slightly more realistic than AI video, but the findings from the existing set of tools are good enough to make security specialists anxious. For security and privacy reasons, you should mask your voice; it can be used to validate your identity, and machines can determine identifying features such as your age, ethnicity, gender, and economic standing simply by listening to you speak.

According to Balasubramaniyan, voice AI services must provide security comparable to that of other organizations that handle personal data such as financial or medical information.

You must inquire the corporation, 'How will my AI voice be stored?'" Are you really keeping my recordings? Are you storing it in an encrypted format? 'Does anyone have access to it?'” According to Balasubramaniyan. "It's a part of me. I need to safeguard it as well."

Podcastle claims that the voice models are end-to-end encrypted and that no recordings are kept after the model is created. The voice clips can only be accessed by the account holder who recorded them. Other audio cannot be uploaded or processed on Revoice by Podcastle. In fact, the individual making a voice copy must say prewritten text directly into the company’s software. They cannot simply upload a prerecorded audio.

"You are the one granting permission and creating the material," Yeritsyan of Podcastle adds. If it’s a real person’s voice, Yeritsyan sees no issue, especially if a person put it out there. 

Podcastle hopes that being able to produce audio in only a consenting person's cloned voice will deter anyone from making themselves say anything too heinous. Currently, there is no content filtering or limitation on certain terms or phrases on the service. According to Yeritsyan, it is up to the service or outlet that distributes the audio, such as Spotify, Apple Podcasts, or YouTube, to control the content that is pushed onto their platforms.

"Any social platform or streaming platform has massive moderation teams," Yeritsyan explains. They ought to keep an eye out to make sure nobody else uses the false voice to construct anything stupid or corrupt and then conceals it.

Even if the highly contentious problem of vocal deepfakes and non consensual AI clones is resolved, it remains unknown if humans will consider a computerized clone as a suitable substitute for a human. 

But what if technology improves to the point where you can't distinguish the difference? Is it important that it's not your favorite podcaster in your ear? Cloned AI speech has a long way to go before it can be distinguished from human speech, but it is swiftly catching up. Only a year ago, AI-generated graphics were cartoonish; now, they're realistic enough to trick millions into thinking the Pope was sporting some snazzy new outerwear. It's simple to picture AI-generated sounds following a similar path.

Drew Carey used ElevenLabs' approach at the end of March to release a whole episode of a radio show read by his voice clone. The majority of people disliked it. Podcasting is a personal medium, and when machines take over the microphone, the distinct human connection you receive from listening to people talk or tell tales is easily lost.

Another very human attribute that is generating interest in these AI-powered technologies is laziness. If AI voice technology improves to the point where it can successfully mimic human voices, it will be possible to conduct short edits or retakes without having to bring the host back into the studio.

"Ultimately, the creative economy will triumph," says Balasubramaniyan. "Even though ethics are important, AI will stay on top because it’s about life simplification”, he says.

Yasmin Anderson

AI Catalog's chief editor

Share on social networks:

Similar news

Stay up to date with the latest news and developments in AI tools at our AI Catalog. From breakthrough innovations to industry trends, our news section covers it all.

29.05.2023

Fashion Brands use AI to create a variety of models. To complete the idea of the diff...

30.05.2023

Country’s Spring Budget is directed towards supporting the AI industry. In the recent...

30.05.2023

Facial recognition tool Clearview AI has revealed that it reached almost a million sea...