July 19, 2024

The Art of Voice Conversion: From Elvis to James Earl Jones

3 min read

ave you ever wondered how technology can transform one singing voice into another? Well, let us introduce you to the world of voice conversion. In this article, we will explore the fascinating process behind creating AI-powered vocal models and how they can be used to change the timbre of a singer’s voice.

What is VITS?

VITS, short for Vocal Information Transfer System, is a cutting-edge technology that takes the knowledge of a trained vocal model and generates a converted voice output. It allows us to alter the sound of a singer’s voice, giving it a completely different tone and texture.

Understanding SVC

Now, let’s dive into the world of SVC, which stands for Singing Voice Conversion. Unlike regular voice conversion, SVC focuses specifically on transforming one singing voice into another. This means we can take a singer’s performance and make it sound like someone else’s, all while preserving the musicality and expression of the original rendition.

The Power of AI in Music

One recent example of AI’s influence in music is the “There I Ruined It” songs. These songs utilize AI technology to change the timbre of the vocalist’s voice, creating a unique and unexpected listening experience. The AI model used in these songs relies on the original vocal performance but modifies it to sound like someone else. It’s similar to how Respeecher’s voice-to-voice technology can transform an actor’s portrayal of Darth Vader into James Earl Jones’ iconic voice.

However, the process is not as simple as pressing a button. Creating an AI mash-up like this requires a series of intricate steps.

The Voice-Cloning Process

To gain more insight into the voice-cloning process, we spoke with Michael van Voorst, the creator of the Elvis voice AI model used in the famous “Baby Got Back” video. He walked us through the necessary steps to create a successful AI mash-up.

First and foremost, van Voorst emphasized the importance of having a high-quality dataset of clean vocal audio samples from the person you want to clone. In the case of Elvis, he used vocal tracks from the singer’s iconic Aloha From Hawaii concert in 1973 as the foundation for training the voice model.

After careful manual screening, van Voorst extracted 36 minutes of pristine audio, which he then divided into 10-second chunks for processing. He meticulously removed any interference, such as band or audience noise, to ensure the best possible results. Additionally, he made sure to include a wide variety of vocal expressions to enhance the quality of the model.

The result? A remarkable AI-powered vocal model that can replicate the essence of Elvis’ voice while allowing for creative modifications.


Voice conversion technology, powered by AI, opens up a world of possibilities in the realm of music. From transforming one singing voice into another to creating unique mash-ups, the potential for artistic expression is endless. As we continue to push the boundaries of technology, we can expect even more exciting developments in the field of voice conversion.

Copyright © All rights reserved. | Newsphere by AF themes.