03/19/2024

Kits and Descript: AI Tools for Audio Creators

Learn more about AI audio platforms Kits AI and Descript and find the best tool for your audio creation workflow.

Descript and Kits comparison graphic
Descript and Kits comparison graphic
Descript and Kits comparison graphic

Over the past few years of the artificial intelligence revolution, much attention has been focused on what AI can do for visual artists. Billions of people have experimented with tools like Dall-E, Midjourney, and Photoshop’s Generative Fill tool to create images with AI.

But did you know there are similar tools for audio projects? Musicians, producers, podcasters, streamers, video editors, and more can use AI to enhance every step of their workflow.

In this article, we’ll look at two of the most popular AI audio tools: Kits, an AI vocal platform for music, and Descript, an AI-powered audio editor podcasts.

Kits AI Tools for Vocals

Kits is a powerful music production tool which uses AI to create high-quality audio. With Kits, you can convert one singer into another and clone a singer’s voice. The creative opportunities are endless. 

Voice Conversion

Kits is built around Convert, which changes a singer’s voice into a completely different one. While other AI tools do this for speech, Kits is the first to offer it for singing. The results are so good that they can pass for professional singers recorded in a high-end studio, making it a hugely versatile tool for producers.

Just upload a file, YouTube video link, or record directly into the web app. In a few seconds, your tune will have a brand new singer!

You can fine tune the Conversion with advanced controls:

  • Remove instrumentals, reverb and delay, and/or backing vocals from your recording for better results.

  • Pitch Shift: Raise or lower the pitch by up to 24 semitones.

  • Conversion Strength: Adds more accent and articulation to the generation, but can cause unexpected results at high levels. 

  • Volume Blend: Control the balance between the input volume and the model. Lower values reveal more of the original dynamics.

  • Pre-Processing Effects: Cut noise, rumble, and harshness, smooth volume, and/or autotune before generation.

  • Post-Processing Effects: Apply compressor, chorus, reverb, and/or delay to the result. 

Voice Training Tutorial

Kits's most futuristic feature is Voice Training. Just upload an audio file or paste a YouTube video link, and Kits trains an AI model to create a perfect clone of the singer’s voice. This new Voice can be used instead of a stock or Blended voice for any conversion (more on those below). 

Kits offers the best Voice Training tool available for singers. Other AI tools do offer it for speech, including Descript which we’ll cover in detail below. However, Descript uses this function mostly for correcting mistakes or simple text-to-speech generations. Kits allows you to effortlessly use the trained voice model for conversions, which is a major advantage.

Kits voice cloning page with files uploaded

To train the voice, Kits allows any recorded audio format. It recommends 10 minutes for best results, but accepts up to an hour. (For comparison, Descript requires you to read a specific script to use as the voice template.) From there, just add a name and photo, then train your new voice! It will be saved in your Voice Library for future use.

Voice Library

Kits offers 50+ Artist Voices in its Voice Library. Each is named for its gender and genre, such as Male Afro Beat or Female Bedroom Pop. You can sort the Library by pitch range, gender, and genre, and there are even voices for other languages and world music styles. They are all completely royalty-free, so you can use them however you like. 

Open tab of the voice library page with no model selected

To further customize your sound, you can combine two Voices with the Voice Blender. The Blend Ratio slider controls how much of each voice to use in training the new model.

Kits AI voice blender tool with 2 models selected

In addition, Kits offers instruments, including guitar, bass, saxophone, and cello. This allows you to effortlessly create instrumentals: just quickly record yourself singing or humming a part, then convert it into an instrument voice.

Text-To-Speech

Kits also offers a text-to-speech function in 14 languages, for narration, voiceovers, and other spoken content. Since Kits’s Voice Library is calibrated for singing, the results tend to be more natural than other AIs. Enter your script, select a pitch range, and generate the speech. The entire Voice Library can be used, plus Blended and Trained voices. 

Voice conversion page with the male synth pop model selected in the text to speech tab

AI Audio Enhancers

Vocal Remover 

Another AI-driven music tool in Kits is the Vocal Remover. Upload a song or YouTube link and the Vocal Remover separates vocals from instrumental and other background noise. Advanced settings allow you to remove backing vocals, and toggle reverb, echo, and noise reduction. With AI built in, Kits’s Vocal Remover tends to do a better job than traditional software at precisely extracting vocals even when similar sounds overlap.

Kits AI vocal remover page

AI Mastering 

Mastering is the final phase of the music production workflow. Compression, limiting, EQ, and more are applied to perfect the final sound and make sure the individual tracks work well together. This has historically been one of the most difficult and expensive elements of production, but Kits AI allows even new producers to master tracks in seconds.

 Kits offers six premade mastering presets:

  • Light & Bright

  • Bass Heavy

  • Punch & Air

  • Lush

  • Tape Glue

  • Analog Warmth

Since the user-friendly process takes just seconds, you can experiment to see which one works best. You can also upload a reference track, whose sound Kits will use as a model.

Kits AI Mastering page with a track input

Kits is not just the most powerful AI singing tool on the market, but an essential tool for modern music producers. It uses AI to enhance every stage of vocal production, allowing you to produce better vocals for less time, less money, and more creativity.

Descript: AI Podcast Editor

Descript is one of the most powerful tools available today for podcasters, with a rich suite of AI audio functions built around a text-based podcast editor. (Descript also offers some video content tools, but we won’t get into those here.) 

Wait, text-based audio editor? Yes, Descript automatically transcribes your audio so you can edit it like a document, with your changes reflected in the audio. Long recordings are transcribed within seconds and stored securely in the cloud and each speaker is automatically labeled. Plus, it works in 22 languages. On top of this unique user experience are a wide range of other AI audio tools for video editing:

AI Voices

Like Kits, Descript includes stock voices which can be used for text-to-speech. There are 21 in total with tags to describe their voice: Masculine or Feminine, Younger, Adult, or Older, plus accents and styles. 

Descript AI voice selection page

Descript also has a voice cloning feature similar to Voice Training on Kits. Interestingly, Descript only allows you to clone your own voice. To verify this, you must record yourself reading a special script as the template. Your voice can be saved to use for text-to-speech, as well as future Overdubs of your own speech. 

Script generated by Descript's voice cloning feature

Regenerate Any Transcription

Regenerate essentially creates a mini voice clone (without the longer process described above), then regenerates a selected piece of text in the recording transcript. This allows for audio edits that would be impossible without AI -- and it might be Descript’s most powerful feature. 

For example, say you’re recording at home and the doorbell rings. Normally, cutting out this moment would be time-consuming, and doing it cleanly enough that listeners don’t notice might be impossible. But with Descript, just locate the moment in the transcription, highlight it, and click Replace With → Regenerate. AI-generated speech will be seamlessly spliced in over that section of the original recording.

And what if you call for your roommate to answer the door? You can easily delete the off-topic words from the transcript, but it will leave an obvious disconnect which listeners can hear. Just Regenerate the phrase around the splice and the AI voice will match the tone and intonation to hide it perfectly.

Overdub

Underneath Regenerate in the Replace With menu is Overdub. Instead of using the AI voice to smooth edits, Overdub uses it to insert new words into the podcast. If you mispronounce a word, flub a line, or simply don’t articulate yourself as well as you should, you can instantly cut out the undesired part and replace it with an AI overdub. 

Since Descript identifies different speakers automatically, the overdub will automatically match the right speaker. Plus, the new audio will match the mic quality, background noise, and intonation of the surrounding recording. 

Descript's Overdub feature

Studio Sound

With one click, Studio Sound’s algorithms make any recording sound professional. Just toggle the switch under Audio Effects, and Studio Sound separates voices from background noise to enhance both. The Intensity slider controls how strongly the effect is applied. The voice will be enhanced, so even a quick iPhone recording sounds like a high-quality microphone. Perfect your video file and remove background noise, hiss, and room echo in simple, intuitive steps. 

Filler Word Removal

Every podcaster has experienced this: you record an episode and think you crushed it. But when you listen back, your speech is riddled with “like,” “um,” dead air, and other filler. These small things can unfortunately have a massive impact on how you come across.

Filler Word Removal is built into Descript, and like the rest of its features, it’s incredibly simple to use. When your audio is transcribed, filler words will be underlined automatically. Click the star icon, then use the editing tool to “Remove filler words” and “Shorten word gaps” to clean up your speech. 

Sample filler word remover function

Finding the Best AI Tool For You

Kits and Descript are at the forefront of AI-enabled audio production. Their tools work simply and elegantly to enhance your existing workflow. Powerful tools with powerful pricing like Kits’s Voice Conversion and Voice Training and Descript’s text-based editor open up reactive possibilities that have never existed before. Plus, features like Vocal Remover and AI Mastering in Kits and Regenerate and Filled Word Removal in Descript eliminate the most time-consuming and tedious aspects of audio production. How will AI audio tools make you a better creator?

Table of contents

Title

Title

Get started, free. No credit card required.

Our free plan lets you see how Kits can help streamline your vocal and audio workflow. When you are ready to take the next step, paid plans start at $9.99 / month.

Get started, free. No credit card required.

Our free plan lets you see how Kits can help streamline your vocal and audio workflow. When you are ready to take the next step, paid plans start at $9.99 / month.

Get started, free. No credit card required.

Our free plan lets you see how Kits can help streamline your vocal and audio workflow. When you are ready to take the next step, paid plans start at $9.99 / month.

Blog Posts Recommended For You