Working with YouTube and Extracting Audio

In my last few articles, I've been exploring the capabilities of ImageMagick, showing that just because you're working on a command line doesn't mean you're stuck processing only text. As I explained, ImageMagick makes it easy to work with images, adding watermarks and analyzing content far more accurately than with the standard Linux file command, and much, much more.

Continuing in a similar vein, I want to look at audio and video in this article. Well, maybe "listen" to audio and "look" at video, but again, I'm still focusing on the command line, so in both instances, player/viewer apps are required.

YouTube to MP3 Audio

As someone who watches a lot of lectures online, I'm also intrigued by the online services that can extract just the audio portion of a YouTube or Vimeo video and save it as an MP3. Listening to a lecture while driving is far safer than trying not to watch a video on the move, for example.

Since there are so many live concert performances online, many people also like to use a video-to-MP3 service to add those songs to their music libraries.

Note: be leery of copyright issues with any download and conversion of content. Just because it's on Vimeo, YouTube or other online service, doesn't mean you have permission to extract the audio or even download it and save it on your computer.

Let's start with the most basic functionality: downloading a video from YouTube so you can watch it on your Linux system. There are a lot of browser plugins and even websites devoted to this task, but who wants to risk malware or be plagued by porn site ads? Yech.

Fortunately, there's a terrific public domain program called youtube-dl on GitHub that covers all your needs. At its most basic, it lets you download video content from YouTube and a variety of other online video repositories, but as you'll learn, it can do quite a bit more.

You can grab a copy for your system here.

Let's start by downloading a copy of one of my own YouTube videos. It's a review of the splendid 1More quad-driver headphones, and its URL is https://www.youtube.com/watch?v=BFL1E77hTHQ.

As an aside: I have a YouTube channel where I review consumer electronics and gadgets. You should subscribe! Find all my videos at http://youtube.com/askdavetaylor.

YouTube has a bunch of ways it can assemble a URL, however, including using its URL-shortener youtu.be, but fortunately, youtube-dl can handle the variations.

Downloading a copy of the video to the current working directory is now as simple as:


youtube-dl 'https://www.youtube.com/watch?v=BFL1E77hTHQ'

The full output of the command is a bit, um, hairy, however:


$  youtube-dl 'https://www.youtube.com/watch?v=BFL1E77hTHQ'
[youtube] BFL1E77hTHQ: Downloading webpage
[youtube] BFL1E77hTHQ: Downloading video info webpage
[youtube] BFL1E77hTHQ: Extracting video information
[youtube] BFL1E77hTHQ: Downloading MPD manifest
WARNING: Requested formats are incompatible for merge and
will be merged into mkv.
[download] Destination: 1More Quad Driver In-Ear Headphones
Reviewed-BFL1E77hTHQ.f137.mp4
[download] 100% of 118.74MiB in 02:49
[download] Destination: 1More Quad Driver In-Ear Headphones
Reviewed-BFL1E77hTHQ.f251.webm
[download] 100% of 4.81MiB in 00:03
[ffmpeg] Merging formats into "1More Quad Driver In-Ear
Headphones Reviewed-BFL1E77hTHQ.mkv"
Deleting original file 1More Quad Driver In-Ear Headphones
Reviewed-BFL1E77hTHQ.f137.mp4 (pass -k to keep)
Deleting original file 1More Quad Driver In-Ear Headphones
Reviewed-BFL1E77hTHQ.f251.webm (pass -k to keep)
$

You can wade through the output messages, but it's the message from companion open-source program ffmpeg that's most important: merging formats into ... mkv.

In other words, the download format of the video is MKV by default. MKV is part of the increasingly popular Matroska Multimedia Container format, and it works with a lot of video players (including VideoLan, aka VLC, my favorite cross-platform video player).

A quick ls reveals the result and that the default filename is taken from the title of the video, something that might not be particularly desirable:


$ ls -lh *mkv
-rw-r--r--  1 taylor  staff   124M Jan 31 16:56 1More Quad
Driver In-Ear Headphones Reviewed-BFL1E77hTHQ.mkv

Do you prefer to specify the output name and have the output file in MP4 (MPEG4) format instead? That's doable:


$ youtube-dl -o 1more-review.mp4 -f mp4 \
    'https://www.youtube.com/watch?v=BFL1E77hTHQ'
[youtube] BFL1E77hTHQ: Downloading webpage
[youtube] BFL1E77hTHQ: Downloading video info webpage
[youtube] BFL1E77hTHQ: Extracting video information
[youtube] BFL1E77hTHQ: Downloading MPD manifest
[download] Destination: 1more-review.mp4
[download] 100% of 57.63MiB in 00:27

As a bonus, you get less ominous informational messages from the program too, so it's cleaner. And the output, sure enough, is in MP4 format:


$ ls -lh *mp4
-rw-r--r--@ 1 taylor  staff  58M Jan 31 16:57 1more-review.mp4

As a second bonus, it's also more efficient in its video encoding, so the MP4 version of the downloaded video is only 58M as opposed to the 124M of the MKV-merged version.

So how do you watch it? Most likely, do a double-click and it'll be up and running, as shown in Figure 1.

Figure 1. Downloaded YouTube Video Playing in Ubuntu Player

That's easy enough, but the original goal was to be able to extract just the audio component of a YouTube video, so let's look at that task.

Downloading Just the Audio Track

Since I've already started to delve into the command-line options for the youtube-dl program, it's not a leap to find out that there's yet another command-line option that lets you save just the audio portion of a video:


$ youtube-dl -x --audio-format mp3 \
    'https://www.youtube.com/watch?v=BFL1E77hTHQ'
[youtube] BFL1E77hTHQ: Downloading webpage
[youtube] BFL1E77hTHQ: Downloading video info webpage
[youtube] BFL1E77hTHQ: Extracting video information
[youtube] BFL1E77hTHQ: Downloading MPD manifest
[download] Destination: 1More Quad Driver In-Ear Headphones
Reviewed-BFL1E77hTHQ.webm
[download] 100% of 4.81MiB in 00:07
[ffmpeg] Destination: 1More Quad Driver In-Ear Headphones
Reviewed-BFL1E77hTHQ.mp3
Deleting original file 1More Quad Driver In-Ear Headphones
Reviewed-BFL1E77hTHQ.webm (pass -k to keep)
$ ls -lh *mp3
-rw-r--r--  1 taylor  staff   4.0M Jan 31 18:22 1More Quad
Driver In-Ear Headphones Reviewed-BFL1E77hTHQ.mp3

That's easy enough, and the output is delightfully small: 4MB total. The problem is, there's the same awkward naming issue, so the addition of -o output-filename definitely will be a win. But, really, youtube-dl makes these tasks trivially easy, as long as you're willing to figure out all of its command-line options.

Writing a Wrapper Script

Instead of worrying about the obscure command-line flag notation, let's just write a script that does the heavy lifting for you. I'm going to call it ytdl for "youtube download", and by default, it'll accept just a URL and output an MP4 format video file that has the same name as the YouTube shortcut (for example, the above video would become BFL1E77hTHQ.mp4).

Add a second parameter, and that becomes the output filename. Specify the -a flag, and it saves audio output only, in MP3 format instead.

Let's start with a usage block if the user forgets to specify anything or just needs a simple reminder:


if [ $# -eq 0 ] ; then
  echo "Usage: $(basename $0) {-a} YouTubeURL {outputfile}"
  echo "   where -a extracts the audio portion in MP3 format"
  exit 1
fi

That's easy enough. The script is also going to use some predefined combinations of flags to make it easier to write:


youtubedl="/usr/local/bin/youtube-dl"
audioflags="-x --audio-format mp3"
videoflags="-f mp4"
flags=$videoflags       # default set of command flags
audioonly=0             # default is audio + video

If the user specifies the -a flag, audioonly will be set to true (that is, 1), and the default flags will switch from video to audio:


if [ "$1" = "-a" ] ; then
  audioonly=1
  flags=$audioflags
  shift
fi

You'll recall that the shift command moves all the parameters "down" one to the left, so $2 becomes $1 and so on. It's an easy way to process and discard parameters in a script, of course.

The biggest block of code creates a default output filename from the YouTube URL:


if [ $# -eq 1 ] ; then
  # no output filename specified
  outfile=$(echo "$1" | cut -d= -f2)
  if [ $audioonly -eq 1 ] ; then
    outfile="$outfile.mp3"
  else
    outfile="$outfile.mp4"
  fi
else
  outfile="$2"
fi

This isn't the most robust code, because it assumes that the URL specified is in a format like the examples used herein, youtube-yadda-yadaa?value=shortcode. It extracts the shortcode and simply appends an appropriate filename suffix. There are better ways to do this, but that's okay, this'll work for now. Just realize that your output format might be a bit weird if you have a very different type of YouTube URL or a URL from another site.

And, finally, the actual invocation of the youtube-dl command:


$youtubedl $flags -o "$outfile" "$1"

That's it! Now you can download a video as simply as:


$ ytdl 'https://www.youtube.com/watch?v=5yXDzg_QDGw' wiper.mp4

And an audio portion with:


$ ytdl -a 'https://www.youtube.com/watch?v=5yXDzg_QDGw'

Nice, eh?

I've way overrun my space for this column, but this is such a fun and simple script atop a terrific, powerful program, that it's worth it, right? And now you know how to make YouTube work for you, rather than vice versa!

Dave Taylor has been hacking shell scripts on UNIX and Linux systems for a really long time. He's the author of Learning Unix for Mac OS X and Wicked Cool Shell Scripts. You can find him on Twitter as @DaveTaylor, and you can reach him through his tech Q&A site: Ask Dave Taylor.

Load Disqus comments