Here's what you need:
Getting your file(s) ready
Once you’ve compiled your vocals, the next step is to prepare your files for training:
- Remove any extra silence (we recommend doing this automatically with Audacity) - Export as true mono (rather than stereo with equal L + R channels) - Export as 16-bit .wav (no audio length requirements, can be one 15-minute file or 15 1-minute files)
How to convert to mono and remove silence with Audacity
Advanced Pre-Processing Tips
Your audio can be:
- cleaned up to remove background noises, plosives, etc. Ultravocalremovergui (https://github.com/Anjok07/ultimatevocalremovergui is a powerful tool for removing instrumentals, reverb, etc.)
- clean EQd (subtractive) to reduce muddy or harsh frequencies in the recording
- subtly pitch corrected (slow attack, moderate strength) unless it's a key part of the vocal style
- De-essed to reduce any harsh sibilance
- compressed lightly to even out dynamic range/reduce peaks (~4-5db of gain reduction at most)
- boosted (additive EQd) to fit the style of the vocal
- limited to a peak of -6db with overall levels between -6 and -12db.
- high/low passed to remove frequencies below 40hz–100hz and above 20khz
- phase re-balanced
Recording vocals for your model? Here are some configurations to get you started.: - Use a quality mic with a wide frequency range (40hz–20khz) - Set your recording sample rate to 48khz and file type to lossless (.wav, .aiff, .flac) - Limit breath sounds and try to capture a clean tone (avoid plosives, place mic off-axis &/or use a pop filter if singing in a breathy style) - Avoid room reflections (record in a room with soft surfaces like carpet and furniture to absorb sound, place microphones away from walls, move closer and reduce your input gain) - Monitor your recording volume and avoid exceeding -6db dBFS. Try to keep your levels between -12 and -6 dBFS. - Export your audio as true mono (rather than stereo with equal L + R channels) - Avoid any hard cuts on audio (add a short fade out to avoid pops that come from cutting audio before or after a zero crossing)
More variety, the better. Best to have examples covering your entire range. Chest, mix, falsetto; large and short intervals; grit and clean notes; etc. The more variety, the better. You can sing the same lyrics in different keys, a couple songs from your repertoire, originals, etc. The audio can be in multiple files or in one single take — as long as the singing time adds up to 10–15 minutes.
How to convert to True Mono
Use the free Audacity program to convert stereo files to true mono.
How to remove silence
Use the free Audacity program to quickly remove silence from an acapella. (Copy the settings in this video but feel free to experiment. Choose a threshold of between -20db and -40db depending on the noise level of your acapella.)
Select your model
Epochs: The number of epochs you choose is the number of times your model can trains over your dataset. It is not always better to use more epochs, because you do not want to overtrain your model. For small datasets (a few minutes), we recommend ~100 epochs. For larger datasets, use 200+ epochs. Training method:The training method is the algorithm used to create your model. We almost always recommend Crepe, which is optimal for creating new voices. However you can also select PM (lowest quality, fastest) or Harvest (medium quality, but a little slower than PM).
Q: How long does model training take?
Depending on the size of your data, model training could take anywhere from 30 minutes to multiple hours! Don’t worry though - as long as you are seeing Training on your create voices dashboard, your model will finish soon.
Q: My model is taking forever to upload! What’s going on?
If you’re uploading a large file, it takes a long time to upload the data on our backend. Just press “Upload” and be patient - it’ll process eventually. Be sure not to refresh the page during your upload.
Q: What do I do if I see an error?
A: If you see an error during upload, contact us at our bug form!