This is a series of articles. Follow the link here to get an overview over all articles.

Introduction

A lot of people were asking me how to add more than one audio stream. This article covers the answer for this question.

What’s stored in a file of a stream

Until now we have produced two output stream variants (stream_0 and stream_1). Each file within a variant contains a video and an audio stream. The master.m3u8 file tells the player when to play which stream variant (because it has different video resolutions).

Now imagine that we have 10 audio languages, then we would have still 2 stream variants but each file contains one video stream and 10 audio streams. This is not great for the users and our servers, because they download a file with 10 audio streams and only one of them is played. This uses unnecessarily much bandwidth.

How to make it better

The solution is very simple: Audio and video is separated into different stream variants. So for our example above we would have one stream variant for audio and two stream variants for the different video resolutions.

This makes it much easier to add later more audio streams. The master.m3u8 file contains still the metadata for the player.

./ffmpeg -listen 1 -i rtmp://martin-riedl.de/stream01 \
-filter_complex "[v:0]split=2[vtemp001][vout002];[vtemp001]scale=w=960:h=540[vout001]" \
-preset veryfast -g 25 -sc_threshold 0 \
-map "[vout001]" -c:v:0 libx264 -b:v:0 2000k -maxrate:v:0 2200k -bufsize:v:0 3000k \
-map "[vout002]" -c:v:1 libx264 -b:v:1 6000k -maxrate:v:1 6600k -bufsize:v:1 8000k \
-map a:0 -c:a aac -b:a 128k -ac 2 \
-f hls -hls_time 4 -hls_playlist_type event -hls_flags independent_segments \
-master_pl_name master.m3u8 \
-hls_segment_filename stream_%v/data%06d.ts \
-use_localtime_mkdir 1 \
-var_stream_map "a:0,agroup:audio128 v:0,agroup:audio128 v:1,agroup:audio128" stream_%v.m3u8

Our command needs now only one audio stream so we remove the second “-map a:0”. The other change we make is how the stream variants are created. We have now 3 blocks (separated by a space) within the stream-map. First the audio, the second one the lower video resolution and the third with the high video resolution. The “agroup” groups audio and video streams together. You can use any name you want (I prefer here audio128; 128kbit audio).

Adding a second audio language

Now it’s easy to add a second or even more audio languages.

./ffmpeg -listen 1 -i rtmp://martin-riedl.de/stream01 \
-filter_complex "[v:0]split=2[vtemp001][vout002];[vtemp001]scale=w=960:h=540[vout001]" \
-preset veryfast -g 25 -sc_threshold 0 \
-map "[vout001]" -c:v:0 libx264 -b:v:0 2000k -maxrate:v:0 2200k -bufsize:v:0 3000k \
-map "[vout002]" -c:v:1 libx264 -b:v:1 6000k -maxrate:v:1 6600k -bufsize:v:1 8000k \
-map a:0 -map a:1 -c:a aac -b:a 128k -ac 2 \
-f hls -hls_time 4 -hls_playlist_type event -hls_flags independent_segments \
-master_pl_name master.m3u8 \
-hls_segment_filename stream_%v/data%06d.ts \
-use_localtime_mkdir 1 \
-var_stream_map "a:0,agroup:audio128,language:GER a:1,agroup:audio128,language:ENG v:0,agroup:audio128 v:1,agroup:audio128" stream_%v.m3u8

We add a new “-map a:1” to pick also the second audio input. Adding the second audio to the stream-map is important too. In my example above the “language” metadata is added so that the player can choose between both audio variants.