PROJECT: PODCAST CLEANER

The Idea

I listen to a lot of podcasts, and some of them post “replay” segments at the start of each episode, which are just re-runs of previous episodes. Thankfully, they also supply the podcast description with the start time of the actual episode content, so I got used to skipping ahead in my podcast app (Overcast), or clicking the timestamp in the description to jump to the start of the new episode. I was a bit frustrated by this process, and had hopes to automate this process, and automatically remove the replay segments from the podcast files.

I knew I could download the RSS XML file for the podcast, parse the description to get the actual start time, and then use ffmpeg to trim the audio file to start at that time. I’d end up with a trimmed audio file with no more replay segments!

Do it manually first

In order to prove out my idea, I decided to manually download the RSS XML file, and a few episode audio files and figure out the right blend of ffmpeg commands I’d need to trim the audio files.

Here’s an example of an episode description with a timestamps.

00:00:00 - Replay of "blah blah" from April 13
00:13:14 - Replay of "foo bar" from April 20
00:20:20 - Some new segment name
00:32:45 - Replay of "baz qux" from March 27
00:45:27 - Another new segment name

As you can see, there are a handful of “replay” segments littered throughout the episode. This is just an example, but it’s not too far off–there are maybe 3-5 replay segments in each episode, and they are generally near the start of the episode, then one sprinkled in later. The timestamps are always in the format HH:MM:SS.

In order to use ffmpeg to remove individual segments, I’ll need to calculate the duration of each segment by using the timestamp as the start time of the segment, and the next segment’s start time as the end time of the segment, and then subtracting the two to get the duration. The very last segment won’t need an “end time”, as we’ll use the start time of the segment, and then assume it runs until the end of the file. With that in mind, here are the manually calculated segment durations:

Segment 1: 00:00:00 - 00:13:14, duration: 00:13:14
Segment 2: 00:13:14 - 00:20:20, duration: 00:07:06
Segment 3: 00:20:20 - 00:32:45, duration: 00:12:25
Segment 4: 00:32:45 - 00:45:27, duration: 00:12:42
Segment 5: 00:45:27 - end of file, duration: ...

I went about this step a few ways with ffmpeg, but will cover using the “atrim” filter in this post.

Manually using the ffmpeg “atrim” filter

ffmpeg’s “atrim” filter

For each segment I want to extract, I need to supply the “atrim” filter the start time (in seconds), end time (in seconds), and a label for that segment. Then I can concatenate the segments I want to keep together into a new output file. Here’s the conversion of the above timestamps to seconds:

Segment 1: 0 seconds to 794 seconds, duration: 794 seconds
Segment 2: 794 seconds to 1220 seconds, duration: 426 seconds
Segment 3: 1220 seconds to 1965 seconds, duration: 745 seconds
Segment 4: 1965 seconds to 2742 seconds, duration: 777 seconds
Segment 5: 2742 seconds to end of file, duration: ...

I can now tell ffmpeg how to extract each segment with a start/end time and then label the segments. The labeling is required so that I can pass those labels to the next ffmpeg filter, which will concatenate the segments together. Here’s the ffmpeg command to extract the segments I want to keep, and then concatenate them together:

# use ffmpeg atrim to keep segments 3 [a1] and 5 [a2]
ffmpeg -i input.mp3 -filter_complex \
  "\
  [0:a]atrim=1220:1965[a1];\
  [0:a]atrim=2742[a2];\
  [a1][a2]concat=n=2:v=0:a=1[out]\
  "\ 
  -map "[out]" \
  -y output.mp3

Here’s a breakdown the ffmpeg atrim commands:

# Extract the first segment that I want to keep:
[0:a]atrim=1220:1965[a1];
# `[0:a]` - Select the audio stream from the input file
# `atrim=1220:1965` - Trim the audio stream from 1220 seconds to 1965 seconds
# `[a1]` - Label the output of this filter as `a1`

# Extract the second segment I want to keep:
[0:a]atrim=2742[a2];
# `[0:a]` - Select the audio stream from the input file
# `atrim=2742` - Trim the audio stream from 2742 seconds to the end of the file
# `[a2]` - Label the output of this filter as `a2`

Now that I have the two segments defined, and labeled, I can pass those details to the concat filter, which will concatenate the two segments together into a new output file. Here’s the breakdown of the concat filter:

# Concatenate (combine) the two segments together, and label it as `out`:
[a1][a2]concat=n=2:v=0:a=1[out];
# `[a1][a2]` - Specify the the two labels/streams I want to concatenate
# `concat=n=2:v=0:a=1` - Use the 'concat' filter, and tell it we have 2 audio streams to concatenate (only audio a=1, no video v=0)
# `[out]` - Label the output of the `concat` filter as `out`

Now that I have the segments I want to keep defined, and concatenated together, I can tell ffmpeg to use the labeled [out] stream as the output stream, and ignore the rest of the streams in the input file. Here’s the breakdown of the remaining ffmpeg arguments, -map and -y:

-map "[out]"
# `-map "[out]"` - Tell ffmpeg we want to use the "[out]" stream as our output stream, and ignore the rest.

-y output.mp3
# `-y` - Overwrite the output file if it already exists
# `output.mp3` - The name of the output file

Once that is run, I have a new file output.mp3 that only includes the segments I wanted to keep. That works, but that is sure going to be cumbersome for each new podcast episode released. Let’s now look at ways to automate this process.

Method 2: Using Ruby to automate

You could pick any scripting language you want, but I usually reach for bash or Ruby first. I’m going to use Ruby in this case because I’ll need an easy way to parse the RSS XML, and extract timestamps from the episode descriptions. I’ll also need to calculate the segment start/end times, and then generate the ffmpeg command to trim the audio file.

Here’s a rough outline of the steps I’ll need to take:

Parse the RSS XML file
For each episode, extract the description
Parse the description to get the raw timestamps (HH:MM:SS) of each segment
Calculate the start time, and end time (in seconds) of each segment
Generate the ffmpeg command to trim the audio file

I’m going to use the rss gem to make parsing the RSS XML easier:

gem install rss
# or
bundle add rss

Now let’s lay out the basic structure of the script:

Given the above script, once completely filled in, we could run it with the following Ruby command:

ruby cleaner.rb "path/to/rss.xml" "Episode Name" "2,4"

We’ll go through the remaining methods one by one, starting with the extract_segments method.

The method starts out by defining a regular expression (“regex”) to parse each line of the description.

line_regex = /^\s*(?<timestamp>\d{1,2}(?<_i>:\d{2})+)\s*-?\s*(?<title>.*?)$/

Remember that in our example, the description followed this format:

HH:MM:SS - Segment Name
HH:MM:SS - Some Other Segment Name
HH:MM:SS - Yet Another Segment Name
...

The regex will match the timestamp and the segment title into named regex “capture groups” for easier extraction. Once extracted, we convert the timestamp into seconds after splitting up each timestamp into hours, minutes, and seconds, and then multiplying each by the appropriate factor to get the total seconds.

start = timestamp_string
          .split(':')
          .map(&:to_i)
          .inject(0) do |ts_segment, total_seconds| 
            (ts_segment * 60) + total_seconds
          end 

# Example:
# "00:13:14" => [0, 13, 14] => (0 * 60 * 60) + (13 * 60) + (14) = 794

We then move on to building the current episode’s segment Hash. I’m using SecureRandom to generate a unique label for each segment that will be used for the ffmpeg atrim and concat filters later on. I also default the “end” time of the segment to nil, as we calculate each segment’s “end” time when we process the next segment in the episode. We use the “start” time of the segment as the “end” time of the previous segment.

segment = { start: start, end: nil, title: title, label: "[#{SecureRandom.hex(4)}]" }

# update the previous segment's end time with the current segments start time
segments[-1][:end] = start if segments.any?

The atrim and concat commands are fairly easy now that we have details about each segment stored in the segments array.

And that’s it. The full code is at the bottom of this post, and you can run it with the following command, as mentioned before:

ruby cleaner.rb "path/to/rss.xml" "Episode Name" "2,4"

Next Steps

The script is working well, but there are a few things to improve:

Issue 1: The script currently only works with a specific description format. It would be nice to make it more flexible.

Issue 2: The script still requires you to run it for each new episode, so it would be a huge improvement to automate the process to download the RSS, and process each episode description based on some rule set to pick the segments you want to keep.

I’ve already done this for my own purposes for one podcast, but it is not generalized.

Issue 3: The script does not create a new RSS XML file with the updated episode description, and link to the new audio file. Again, I’ve done this for my own purposes, but have not generalied the solution. I rewrite the RSS XML file, and upload it to S3 along with the new audio file. Then I’ve pointed my podcast app to that new RSS feed on S3, and it works great.

If you end up finding this useful, would like to see more, or have any suggestions, let me know! You can find my contact details in the footer of this site.

Project: Podcast Cleaner

The Idea

Do it manually first

Manually using the ffmpeg “atrim” filter

Method 2: Using Ruby to automate

Next Steps

Full Code

Carl Furrow