Edit Podcasts and Videos by Editing Text with AI

What You'll Learn

How to import a raw audio or video recording into Descript, use its AI transcription to edit by deleting and rearranging text, remove filler words automatically, and fix spoken mistakes by typing corrections rather than re-recording.

Why This Matters

Traditional audio and video editing requires staring at waveforms and timelines — a skill that takes time to learn and patience to use. Most people who record podcasts, tutorials, or presentations give up on editing because it feels too technical, leaving their content rougher than it needs to be.

Descript removes that barrier entirely. Because it links every word in the transcript to the exact moment it was spoken, editing becomes as simple as editing a document. Anyone who can use a word processor can clean up a recording in Descript — no audio engineering background needed.

Step-by-Step Guide

Step 1: Import your recording

Go to descript.com, create an account, and click New Project. Drag and drop your audio or video file into the project. Descript will automatically transcribe it — this usually takes one to two minutes for a standard recording.

Step 2: Remove filler words automatically

Once the transcription is complete, click Actions in the top menu and select Remove filler words. Descript will highlight every "um", "uh", "you know", and "like" in the transcript. Review the highlighted words, then click Remove all to delete them from the recording in one step.

Tip: Review the list before removing. Occasionally "like" is used
intentionally (e.g. "I like this approach") and you'll want to
uncheck those instances before confirming.

Step 3: Edit the transcript to cut and rearrange content

Read through the transcript as you would a document. To remove a section — a rambling intro, a repeated point, an off-topic tangent — simply select the text and press Delete. That section disappears from the recording.

To move a section, cut the text and paste it elsewhere in the transcript. The audio or video will rearrange to match.

Example: You recorded a 20-minute podcast and the best anecdote
is buried 15 minutes in. Select the paragraph in the transcript,
cut it, and paste it near the top. Done — no timeline scrubbing needed.

Step 4: Fix spoken mistakes with Overdub

If you stumbled over a sentence or misspoke a name, you don't need to re-record. Use Descript's Overdub feature to fix it by typing.

First, set up Overdub by going to Settings → Overdub and following the prompts to record a voice sample (about 10 minutes of clean speech). Once trained, you can:

Select the incorrect text in the transcript
Type the corrected version
Click Regenerate with Overdub — Descript replaces the audio with a new version in your voice

Example correction:
Original spoken: "The report was published in twenty twenty-two"
You meant to say: "The report was published in twenty twenty-three"
→ Select "twenty twenty-two" in the transcript, type "twenty twenty-three",
  click Regenerate — the correction appears seamlessly in the recording.

Step 5: Export your finished recording

When you're happy with the edit, click Publish or Export. For podcasts, export as MP3. For video, export as MP4. Descript also lets you export captions as an SRT file at the same time — saving you a separate captioning step.

Tips for Better Results

Do the filler word pass first. Always run automatic filler word removal before doing any manual editing. It handles the most tedious part instantly and gives you a cleaner transcript to read through.
Edit in whole sentences. Cutting mid-sentence often creates an audible click or awkward rhythm. Select complete sentences or natural pause points for the cleanest cuts.
Record Overdub in a quiet room. The quality of the voice clone depends entirely on the quality of the training sample. Record in the same environment you normally record in for the most natural match.
Use ElevenLabs for narration-only projects. If you need to generate entirely new narration from scratch rather than fix an existing recording, ElevenLabs gives you more voice options and control over style and pacing.
Check transitions after moving sections. After rearranging paragraphs, always listen to the join between the moved section and what comes before and after it — occasionally you'll need a short crossfade to smooth the edit.

Tools That Work Best for This

Descript — The primary tool for transcript-based editing, filler word removal, and Overdub voice correction on existing recordings.
ElevenLabs — Best when you need to generate entirely new spoken narration from written text, or when you want a wider range of voice styles than Overdub provides.