![]() |
P.O. Box 1904 Woodinville WA 98077
![]() Magic Vocals is a new (still under development) integrated vocal analyzer, music editor and real-time sequencer that enables music producers, arrangers and enthusiasts to manipulate their recorded vocals in extreme but highly controlled ways. Before Magic Vocals, recording engineers could only make small pitch or time corrections to their vocal tracks (Auto Tune, Melodyne, etc..) before the the sound becomes audibly distorted. Rather than treating the vocals as a simple series of audio samples, Magic Vocals instead models the recorded vocals as human speech and musical sequences. This allows Magic Vocals to resynthesize the vocals instead of simply processing the audio. Enabling users to edit their vocal tracks in very useful ways not possible before. So how does Magic Vocals do this? Shown below is a generalized flow diagram of the internal processing. The vocals are imported and analyzed, creating two independent models: Pitch and Acoustic.
By decoupling the vocal pitch from the the vocal acoustics, this allows each of them be edited and resynthesized independent of one another. In other words, you can dramatically change pitch, timing and tempo of the vocals without altering any of the acoustic characteristics. Recorded vocals are now manipulated like any other MIDI note! ![]() Magic Vocals can be used along with any musical sequencer or workstation (Sonar, Cubase, Logic, GarageBand, etc.) to provide amazing editing flexibility to audio vocal tracks after they are recorded. Some examples:
To illustrate this, here are some examples of what Magic Vocals can do (all demos below were captured directly from Magic Vocals with no additional editing or processing, except for MP3 compression). This is a short snippet of a singer who unfortunately can only sing one note: However, Jackie imported her recording into Magic Vocals and was quickly and easily able to create this: Note the top voice is almost an octave above the original! All the notes have new pitches, timings, vibrato and articulations. Since the vocals are re-synthesized, Jackie can alter them at will in extreme ways. Here the tempo is slowed way down to 25 BPM: Jackie must have impressive lungs to sing this! Notice the above drastic tempo modification was not a simple blind linear expansion of the entire recording. With human speech and singing, various phonemic units have varying degrees of elasticity. Since the vocal synthesizer knows how to slow down or speed-up human speech articulation, the result is the complex but natural phonemic unit modulation - same as what a human would do at that tempo. There are more demos below, but first a bit about the actual application.
The Song View Editor shows an overview of all the tracks in the song file.
The following screen shots show how a user typically imports vocal audio into Magic Vocals. The purpose of the import is to analyze the vocals (song or spoken) and create the pitch and acoustic models. Here's a source recording of a professional singer : The user picks a target track and initiates an import by bring up the Import Wizard. To reduce the process to something more manageable, Magic Vocals guides the user through a series of interactive sub-tasks. The first task is to pick a sub-phrase from the source recording to analyze. Below, Magic Vocals suggests to divide the audio into two smaller phrases ("Old man sunshine listen you" and "never tell me dreams come true"):
After the user picks the first phrase, the Import Wizard asks for the phrase lyric text:
Finally, after a quick analysis of the phrase audio signal, the user is asked to preview (left panel) the result word alignment and make any necessary corrections (right panel) since speech recognition technology is not perfect and will occasionally make errors.
For more advanced users, the spectrum view actually shows unit boundaries more clearly and accurately (below):
Once the vocal audio is imported, Magic Vocals then creates the pitch and acoustic models. So here's the 2-phrase model re-synthesizing the same notes, timing and tempo: Compare this with the original audio above (seems pretty darn close!). Although Gershwin wrote this tune, actually it's pretty boring. After recording this in the studio, suppose Gershwin came back and said he has a more interesting tune than what you recorded. It'd be very costly and time consuming to call the singer back in to the studio, recreate the recording parameters and record the phrases again to a new melody. So with Magic Vocals, the engineer only needs to edit the notes to get the entirely new tune: Sounds like it was originally recorded with this new tune - you'd never know it was drastically changed and then re-synthesized! Now suppose Gershwin came back again and wanted the parts harmonized? No problem - a simple copy-and-paste and a one-minute note edit to get this: Below is shown the editing window for modifying the pitch model parameters. The model is composed of a Note View (below):
And a Contour View for modifying pitch articulations. Note how the actual contour is abstracted into a series simple straight-line envelope segments. This abstraction makes it easier and more intuitive to edit and modify rather than editing the actual pitch envelope (e.g. Auto Tune, Melodyne, etc.).
Finally, there's an interactive editor for making changes to each Acoustic Model:
Below are some manipulations of Arnold Schwarzenegger’s voice. First, the audio is synthesized by Magic Vocals as originally spoken: Schwarzenegger model re-synthesized His interview can be sped up or slowed in a natural way: Next, ever wonder what Arnold would sound like if he stopped taking steroids? Or how Arnold will sound like when he gets older: Finally, we all know what he really is: During development milestones, it’s often necessary to try extreme user scenarios to see how far the product can be pushed before it breaks. One recent stress test was inputting a recording of President Kennedy’s famous “Man on the Moon” speech that accelerated the US space program. This audio is stressful for 3 reasons:
Here's the original 1961 recording: Here's the model re-synthesized. Some problems (and a synthesis bug), but still much better than expected!: For fun, here's a group of Kennedys presenting the speech: And finally, to be even more persuasive, Kennedy tries singing his famous speech to Congress: Another innovative application using Magic Vocals technology is creating and playing back sung vocal phrases as controlled loops. To demonstrate how the advanced Magic Vocals engine is uniquely beyond the current stat-of-the-art for looping human voices, Apple Loop examples are taken from the Apple "Jam Pack Voices" as source material. These are high-quality loops that are used with the Apple Logic, GarageBand and Soundtrack editors on the Mac. This first demo shows how sung tempo and pitch (for the first time!) can be modified in natural and convincing ways. Also note that Magic Vocals is not limited to English lyrics. Here is the original unmodified Apple Loop: Next is GarageBand playing this Apple Loop a tritone lower. Sounds like the original singer got transformed into a chorus of Darth Vaders!
Now compare that to the natural sounding transposition using Magic Vocals. Sounds like the same singer - only lower: Mozart lowered using Vocal Loops™ Let's try slowing the tempo down as far as it can go in GarageBand:
To show-off, Magic Vocals slows the tempo even further by an additional 30%: Mozart slow tempo using Vocal Loops™ And unlike everyone else, with Magic Vocals you're not limited to what you recorded. Change it any way you want: Mozart new tune using Vocal Loops™ Next example shows a breathy singer transposed up. Here's the original GarageBand not only raises the pitch up a fourth - but also whether you want it or not applies their "Chipmunk Under Water" effect:
Finally the same phrase raised, but maintaining the singer's distinct voice characteristics: |