bdaudey

Professional AI Voice-over and Dubbing

Today, the use of AI dubbing grows hand in hand with AI integration and audiovisual communication. What was deemed “unthinkable” and out of reach just a few years back, is very quickly becoming the standard practice. Who would’ve thought that a top shelf professional dubbing and voice-over service, would be a few hundred euros away? However, one must tread carefully, for without proper quality control, the results of an AI dubbing may be catastrophic. Discover in this article how to create cost-effective, high-quality content.

The Golden Age of Video Content

In 2022, video content accounted for 82% of worldwide consumer internet traffic, growing 15 times over a 5-year period and this trend shows no sign of slowing down. Advertising and web marketing specialists and agencies have massively implemented video content in their communication and marketing strategies, and the results are satisfactory, to say the least.

Be it on platforms, websites, via emails or paid advertisements, or during events, video communication stands as the primary tool to capture the consumer’s attention. In 2016, TikTok was born; a fast-growing video-dedicated platform with almost 2 billion active users today. Instagram (Meta) has recently shifted its focus from photo to video content with its algorithm currently favoring Reels. The social media giants, Facebook (Meta), X (previously Twitter) and LinkedIn, display ever-growing consumer interaction rates with video content.

Over the last 5 years and in order to reach an international audience, content creators resorted to subtitling, creating content in English, or even uploading as many videos as there were languages corresponding to their target audience across the world. The latter can be seen with Youtube superstar, Mr Beast (247 million subscribers) up until Youtube introduced its very own multi-language audio feature, later followed by automatic dubbing. Atenao is recorded as the first French language service provider to use Youtube’s multi-language audio feature in a flagship project with Amixem.

What is Voice Cloning?

Voice cloning is the artificial creation of human speech via AI based algorithms and software. These algorithms analyze and recreate acoustic and linguistic models of a prerecorded voice samples, in order to generate new speech sequences replicating, to a tee, a particular person’s voice.

This procedure stands on providing machine learning algorithms with rich and diversified audio data allowing AI to remodel and reconstruct the subject’s voice accurately to generate any required speech sequence. The more audio data you provide, the better the voice cloning performs.

As it stands, voice cloning has numerous applications, from audio feedback in digital assistants and onboard navigation systems, audiobooks, video games or in the entertainment industry, where we now can recreate the voice of fictional characters or even bring back the legendary voices of deceased celebrities.

AI Dubbing and Voice Cloning? A controlled approach for optimal results

Today, the vast majority of machine subtitling platforms have adopted voice cloning. The competition rages on as they lock horns with AI dedicated websites and businesses such as Checksub and Lipitt (which caused quite the stir with its LinkedIn post featuring Zendaya) and Vexub, based in France or even Lovo and Rask based in the United States, as well numerous Chinese competitors, for the biggest piece of the pie.

To guarantee a successful AI Dubbing, it is crucial to follow a precise step-by-step procedure. We often found ourselves reworking and correcting AI dubbed videos due to the extremely poor quality of the translated text. Here are the 7 steps one must follow for optimal results.

Step 1: Transcription

For this task, we selected Checksub. The platform born in the sunny south of France is, from our standpoint, superior to Rask or Lovo. Time-coded machine transcription is by default, a speech-to-text process allowing us to obtain an SRT file format broken down into subtitling compatible segments. Different functions allow us to rework the initial machine transcription to create an accurate, timeline-compatible, text segmentation for the AI dubbing or voice-over.

Step 2: Translation

If you’re opting for a neural machine translation process, here’s why we recommend using a DeepL Pro account. First and foremost, your data will only be used for the duration of the project and will be discarded of after project completion. Moreover, when you use a DeepL Pro account, your data will not be harvested for software optimization purposes. The terms of use differ greatly using DeepL’s free version or even Google Translate.

Step 3: Proofreading and Correction

Proofreading the translation is standard procedure to correct spelling, grammar and translation errors produced. Adjustments to the target texts are made and compared with the source texts while listening to the speech in question in order to track potential pronunciation mistakes or audio errors.

Ideally, one should opt for a post-editing service performed by a professional translator.

If you had already translated your texts using machine translation, we will require the texts in a timestamped dual entry table (source text / target text) as shown below.

 

It is crucial you submit the documents for proofreading and correction before proceeding to AI voice generation or cloning if you intend on performing the audio recording in-house.

Step 4: Recording

AI voice generation or cloning.

Step 5: Vocal Quality Control

It is essential to go perform a quality control procedure on the AI cloning or voice generation to guarantee the quality of pronunciation, intonation, syntax and phrasing.

Step 6: Finalization

The finishing touches include audio quality control, calibration, time-code accuracy and lip-sync check as well as a final overall viewing and listening by the auditing translator.

Step 7: Delivery

After the finishing touches, the project completion is confirmed with the client then ready for final delivery.

The vital editorial quality of the source text

Machine translation often offers quite literal translations and hardly does the source text any good, and more often than not, diminishes its quality. Hence, the equation is clear: poor source text = an even poorer translation. In hopes of having “decent” quality machine translation, ensure your source text is of an impeccable editorial quality.

Last but not least, a post-editing service will surely eliminate all terminology and syntax errors produced by the machine translation, however it will not magically turn you source text into a literary marvel.

The limits of AI Dubbing

Voice cloning is undoubtedly surrounded by a cloud of ethical questions ranging from the potential of falsifying statements and altering speeches, all the way to privacy policies regarding the stocking and use of vocal data collected from specific individuals. These concerns are to be addressed through legislations guaranteeing the correct and ethical use of this technology.

On another hand, the technical limitations of AI dubbing are by no means negligible. Let’s take for example, Lipitt’s marketing stunt with Zendaya’s clip published on LinkedIn. The clip and its concept may be impressive; however, they do not reveal the arduous work behind the final results. Long hours of image correction and lip sync modifications are crucial to obtain clean results while using AI. In order to alleviate this potential issue from the start, it is crucial to make terminological choices compatible with the speaker’s lip movement. This synchronization task is performed by our translators.

Last but not least, what is the purpose of voice cloning? Why do we watch videos? Because our senses awaken when closely watching someone’s genuine personality and attitude as they speak. Hence a video remains superior in its authenticity than a simple text or audio recording. We might even go as far as saying that a video’s authenticity could be ruined by completely replacing the speaker’s original voice, statement, and lip movement, even if the difference is barely noticeable. At this point, why watch a video portraying someone, who is no longer truly themselves? We might as well just replace the individual with a completely AI generated character. Soon enough, the movie industry will be rushing to automate feature-film dubbing; and an AI layer paired with special effects, should not raise many ethical questions with Hollywood producers. But then where do we draw the line between animation and reality?

When HugoDécrypte chooses voice-over instead of dubbing in Timothée Chalamet and Zendaya’s original interview, he voluntarily keeps both actors’ voices in the background. Is this not a testimony of respect? Is it not more enjoyable for us viewers to hear the actor’s very own words and voice, and be able to see their genuine facial expressions?  We believe that authenticity will always have greater impact and value over a gimmick.