Ryan Schwabe

Music Streaming & Loudness Normalization

Ryan Schwabe
Screen Shot 2017-09-24 at 11.37.25 PM.png

Streaming services determine an average loudness value for singles, EPs & LPs.  The loudness value of the recording is used to normalize playback volume to a target level set by the streaming service.  Volume normalization is achieved by turning up or down the recording's playback level in order to match the target level.  The Audio Engineering Society suggests a streaming target level of -16LUFS, however, prominent streaming services use a target level between -13 and -16LUFS.  However, target levels for streaming services are are much lower in volume than the master levels preferred by many modern artists, producers and engineers.  To this effect, the louder an engineer masters their project, the more a streaming services will turn down the recording to match their target level.  For example, if you master an album to -8LUFS (loud) and submit the files to Spotify, they will turn down the songs playback volume 6dB to match its target volume of -14LUFS.  

Streaming Service Target Volumes:

  • Apple Music (soundcheck on):
    •  -16LUFS
  • Spotify:  
    • -14LUFS
  • Tidal:
    • -14LUFS
  • YouTube:
    • -13LUFS

Mastering Levels and Streaming Service Target Volumes:

Below are five different masters of a single song at a different loudness levels (-8LUFS, -10LUFS, -12LUFS, -14LUFS and -16LUFS).  The waveforms in the black boxes represent the five sets of masters.  The target playback level in the below example is -14LUFS (Spotify & Tidal).  Loud master recordings are attenuated (turned down) to the streaming service's target volume.  Quieter masters are not attenuated as much and provide for a greater peak to loudness ratio. Conversely, the louder you master you album, the lower your peak to loudness ratio will be.     

The -8LUFS master is turned down 6dB, -10LUFS is turned down 4dB, -12LUFS is turned down 2dB, -14LUFS file is uneffected and the -16LUFS file is amplified by 2dB.

The -8LUFS master is turned down 6dB, -10LUFS is turned down 4dB, -12LUFS is turned down 2dB, -14LUFS file is uneffected and the -16LUFS file is amplified by 2dB.

Test Files Submitted to Streaming Services:

The below loudness test files were submitted to all streaming services.  Each file consists of identical sequences of pink noise calibrated to specific loudness level. The five songs were submitted as "singles" to streaming services so that each track's volume is assessed individually, and not as an average for the entire EP or LP.  Some streaming services have an  "album mode" which normalizes the entire album's average volume to the streaming service's target volume and maintains the individual level differences between tracks set by the mastering engineer.  The below test files were submitted as singles to avoid the album mode loudness measurement.  If the files were submitted as an EP, the differences in level between tracks would be maintained during playback. You can download the 16 bit, 44.1kHz test files below and the AAC files here.     

Track Info:

  • "8 Times"      -8LUFS,  -1.9dBTP 
  • "10 Shoes"   -10LUFS,  -4dBTP 
  • "12 Dozen"   -12LUFS,  -5.9dBTP 
  • "14 Team"     -14LUFS,  -7.9dBTP 
  • "16 Ounces" -16LUFS,  -9.9dBTP 

Streaming Services & Normalized Playback Volume:

Click the below links to open in-app playlists of the above test files.

The track "8 Times" is mastered 8 dB louder than "16 Ounces", but both tracks play back at a very similar perceived volume on streaming services.  By mastering at levels closer to the normalized playback volume, your music will take advantage of the full dynamic range of the playback medium at a similar perceived playback volume.  The difference between average volume and peak information is measured as a peak to loudness ratio (PLR) or crest factor.  

Conclusion:

The use of playback normalization algorithms are eliminating the need for projects to be mastered at extremely high levels.  Songs mastered at vastly different volume levels are streamed at almost identical playback levels. Even though each streaming service has a different approach to loudness normalization they all use a target level far below the master volume preferred by many modern artists, producers & engineers.  By mastering records closer to streaming service's target playback level you will achieve a similar perceived playback volume, but gain the benefit of additional transient detail. With that said, I believe some styles of music can benefit from well tuned compression and limiting, but there is a clear point of diminished returns.  As you can see in the below picture, a song mastered closer to the playback level of a streaming service will provide additional transient detail over a loud master, but will stream at an identical or extremely close playback level.  You should work with your mastering engineer to create a final master that both takes into consideration the loudness normalization of streaming services and the artistic and sonic vision you have for your project.  

Both files will play back with the same perceived volume on streaming services.  

Both files will play back with the same perceived volume on streaming services.  

Unfortunately, not all services have adopted playback normalization, but Soundcloud is expected to adopt a loudness normalization in the near future.


Streaming Services That Do Not Normalize Volume:

Soundcloud:

Bandcamp:

Mixing and Mastering at 96kHz

Ryan Schwabe

We all use audio plugins to massage and mangle our recordings.  Compression, EQ, distortion, time-based effects, modulation, hardware emulations; we push plugins to the limit to help us create new and unique sounds.  However, plugins must work within the harmonic limitations of the sampling rate set by the digital audio workstation.  We may not be able to hear above 20kHz, but analog electronics and modern plugins create harmonics above our hearing range that affect the sounds we hear.  If a plugin generates harmonics higher than the Nyquist-Shannon frequency limit of the digital audio system, aliasing artifacts are partially folded back into the audible spectrum. 

Take a look at the below drawing showing the fundamental tone f/0 in green.  When the tone is distorted, the 2nd, and 3rd harmonics are created.  As you can see, the anti-aliasing filter is not ideal. At 44.1kHz sampling rate the 3rd harmonic is below the anti-aliasing cutoff filter, but above the Nyquist-Shannon frequency.  Because the anti-aliasing filter is imperfect and does not filter the 3rd harmonic, it is folded back into the audible spectrum of the signal.  This is known as aliasing fold back.   

Distortion content exceeds the Nyquist-Shannon frequency and is folded back into the audible spectrum.  Aliasing  fold back  is dependent on the type of distortion applied.  The above harmonic distortions and aliasing are simplified for clarity.  In reality the interactions are much more complex.

Aliasing fold back (or distortion) at low sample rates is more prevalent with plugins that generate a lot of harmonics such as compression, distortion or colorful EQ’s.  Many plugin manufactures use oversampling in order to better manage harmonic content created by the algorithm.  The background oversampling process up-samples the signal by 2x, 4x, 8x, or 16x, performs the processing, filters out the harmonics and then down-samples to the host-sampling rate.  The oversampling process moves the Nyquist-Shannon filter far beyond the human hearing range, reducing the chance of fold back aliasing.  Plugin designers use this process because it adds clarity to their algorithms, but it takes a toll on the CPU and causes additional plugin delay.  

Below I will show a few examples of plugins creating different results in 96kHz and 44.1kHz sessions with the exact same plugin settings and gain staging.

Compression: Waves CLA-2A

10kHz sine wave generator -> CLA-2A -> Nugen Visualizer

Equalization: Universal Audio 88RS

10kHz sine wave generator -> UA 88RS -> Nugen Visualizer

Distortion: Soundtoys Decapitator

10kHz sine wave generator -> Decapitator -> Nugen Visualizer

Saturation: Plugin Alliance bx_saturator

10kHz sine wave generator -> bx_saturator -> Nugen Visualizer

As you can see, the 96kHz session plugins create more harmonics above the source signal and the 44.1kHz session plugins create some aliasing fold back and more distortion below the source signal.  Admittedly, all of the plugins are character style processors that add harmonics to the signal.  Cleaner plugins will not create nearly as many harmonics, nor will they create as much, or any fold back aliasing.     

If individual plugins are capable of creating harmonics, multiple plugins across an entire session will create a complex mix of harmonics with the potential for fold back aliasing.  Lets look at a real world example of two identical "in the box" mixes of the same song at 96kHz and 44.1kHz.  Look over the below block diagram to understand how these identical offline-bounce, "in the box" comparison mixes were created.  

Harmonic content generated by plugins in the 44.1kHz and 96kHz sessions:

The mixes for the 96kHz sessions show that the plugin processing created harmonics between 22kHz and 28kHz.  However, the 44.1kHz examples filtered away the harmonics and partially folded them back into the audible spectrum of the recording.  

Below is a stream of the phase inverted difference between the 96kHz session bounce and the 44.1kHz session bounce.  

When phase inverting the 96kHz and the 44.1kHz bounce (up sampled to 96kHz) we are listening to the differences between the files. What we hear may exist in either the 96Khz or 44.1kHz bounce since the remaining audio is not specific to one session or the other.  

These tests identify a few possible benefits of working at 96kHz.  First, the 96kHz session moves the Nyquist-Shannon frequency far above the hearing spectrum, reducing fold back aliasing and allowing for the creation clean harmonic content.  In the above phase inversion test you can clearly hear the aliasing on the open hi-hat.  Second, there is a distinct amount of high frequency detail that is prevalent in the 96kHz bounce that is not captured in the same way in the 44.1kHz bounce.  This high frequency detail can be heard in what remains of the vocal in the phase inversion test above.  Third, higher sample rates allow you to control transient detail with more precision and less distortion than at lower sample rates.  This is why we see oversampling features built into many popular digital mastering limiters.  

If you get a chance, play around with higher sample rates and let me know the differences that you hear.