Ryan Schwabe

Music Streaming & Loudness Normalization

Ryan Schwabe16 Comments
Screen Shot 2017-09-24 at 11.37.25 PM.png

Streaming services use volume normalization to create a balanced listening experience across playlists and albums. Services like Spotify, Tidal and Apple Music determine an average loudness value for singles, EPs & LPs using a loudness measurement called LUFS.  The song, EP or LP's loudness value is used to normalize playback volume to a target level set by the streaming service.  The Audio Engineering Society suggests a streaming target level of -16LUFS, however, most streaming services use a louder target level between -13 and -16LUFS. Streaming service's target levels are much lower in volume than master levels preferred by modern artists, producers and engineers.  Because of the difference between master levels and streaming service target levels, the louder a master recording is made, the more streaming services will turn down the recording to match their target level.  For example, if you master an album to -8LUFS (loud) and submit the files to Spotify, they will turn down the songs playback volume 6dB to match their target volume of -14LUFS.  

Streaming Service Target Volumes:

  • Apple Music (soundcheck on):
    •  -16LUFS
  • Spotify:  
    • -14LUFS
  • Tidal:
    • -14LUFS
  • YouTube:
    • -13LUFS

Mastering Levels and Streaming Service Target Volumes:

Below are five different masters of a single song at a different loudness levels (-8LUFS, -10LUFS, -12LUFS, -14LUFS and -16LUFS).  The target playback level in the below example is -14LUFS (Spotify & Tidal).  Loud master recordings (pink, orange, yellow) are turned down to the streaming service's target volume.  Lower level masters are not turned down as much and provide for a greater peak to loudness ratio than albums that are mastered at loud volumes.  in effect, the louder you master your album, the lower your peak to loudness ratio.     

 The -8LUFS master (pink) is turned down 6dB, -10LUFS is turned down 4dB, -12LUFS is turned down 2dB, -14LUFS file is uneffected and the -16LUFS file is amplified by 2dB, potentially approaching the service's playback limiter.

The -8LUFS master (pink) is turned down 6dB, -10LUFS is turned down 4dB, -12LUFS is turned down 2dB, -14LUFS file is uneffected and the -16LUFS file is amplified by 2dB, potentially approaching the service's playback limiter.

Test Files Submitted to Streaming Services:

To illustrate the playback volume manipulation performed by streaming services I have submitted test files of master levels to streaming services.  Each file consists of an identical sequences of pink noise calibrated to specific loudness levels. The five songs were submitted as "singles" to streaming services so that each track's volume is assessed individually, and not as an average for the entire EP or LP.  Some streaming services have an  "album mode" which normalizes the entire album's average volume to the streaming service's target volume and maintains the individual level differences between tracks set by the mastering engineer.  The below test files were submitted as singles to avoid the album mode loudness averaging. This simulates what a song would do when it is added to a playlist. You can download the 16 bit, 44.1kHz test files below and the AAC files here.     

Test File Info:

  • "8 Times"      -8LUFS,  -1.9dBTP 
  • "10 Shoes"   -10LUFS,  -4dBTP 
  • "12 Dozen"   -12LUFS,  -5.9dBTP 
  • "14 Team"     -14LUFS,  -7.9dBTP 
  • "16 Ounces" -16LUFS,  -9.9dBTP 

Streaming Services & Normalized Playback Volume:

Click the below links to open in-app playlists of the above test files.

The track "8 Times" is mastered 8 dB louder than "16 Ounces", but both tracks play back at a very similar perceived volume on all Spotify, Tidal and Apple Music. 

Conclusion:

As you can hear in the above playlists, louder masters do not create a louder playback experience for the listener.  The use of playback normalization algorithms eliminate the need for projects to be mastered at extremely high levels as they were in the early aughts.  Songs mastered at different volume levels are streamed at almost identical playback levels. Even though each streaming services has a different approach to loudness normalization, they all use a target level far below the master volume preferred by many modern artists, producers & engineers.  By mastering records closer to streaming service's target playback level, you will achieve a similar perceived playback volume, but gain the benefit of additional transient detail in the lower level master. As you can see in the below example, a song mastered closer to the playback level of a streaming service will provide additional transient detail (peak to loudness ratio) over a loud master, but will stream at a similiar playback level.  Obviously, music is not made by measurement and some forms of music simply sound better with more compression and limiting in the master recordings, while other styles of music will benefit from a more gentle approach.  You should work with your mastering engineer to determine an appropriate target level that suits your particular project and genre.

 Both files will play back with the same perceived volume on streaming services, but the lower level master will take advantage of a higher peak to loudness ratio

Both files will play back with the same perceived volume on streaming services, but the lower level master will take advantage of a higher peak to loudness ratio


Addendum:

Unfortunately, not all platforms have adopted playback normalization into their listening experience.  Soundcloud and Bandcamp do not perform volume normalization.  Soundcloud is said to have plans to adopt loudness normalization, but not Bandcamp.

Soundcloud:

Bandcamp:

Mixing and Mastering at 96kHz

Ryan Schwabe19 Comments

We all use audio plugins to massage and mangle our recordings.  Compression, EQ, distortion, time-based effects, modulation, hardware emulations; we push plugins to the limit to help us create new and unique sounds.  However, plugins must work within the harmonic limitations of the sampling rate set by the digital audio workstation.  We may not be able to hear above 20kHz, but analog electronics and modern plugins create harmonics above our hearing range that affect the sounds we hear.  If a plugin generates harmonics higher than the Nyquist-Shannon frequency limit of the digital audio system, aliasing artifacts are partially folded back into the audible spectrum. 

Take a look at the below drawing showing the fundamental tone f/0 in green.  When the tone is distorted, the 2nd, and 3rd harmonics are created.  As you can see, the anti-aliasing filter is not ideal. At 44.1kHz sampling rate the 3rd harmonic is below the anti-aliasing cutoff filter, but above the Nyquist-Shannon frequency.  Because the anti-aliasing filter is imperfect and does not filter the 3rd harmonic, it is folded back into the audible spectrum of the signal.  This is known as aliasing fold back.   

Distortion content exceeds the Nyquist-Shannon frequency and is folded back into the audible spectrum.  Aliasing  fold back  is dependent on the type of distortion applied.  The above harmonic distortions and aliasing are simplified for clarity.  In reality the interactions are much more complex.

Aliasing fold back (or distortion) at low sample rates is more prevalent with plugins that generate a lot of harmonics such as compression, distortion or colorful EQ’s.  Many plugin manufactures use oversampling in order to better manage harmonic content created by the algorithm.  The background oversampling process up-samples the signal by 2x, 4x, 8x, or 16x, performs the processing, filters out the harmonics and then down-samples to the host-sampling rate.  The oversampling process moves the Nyquist-Shannon filter far beyond the human hearing range, reducing the chance of fold back aliasing.  Plugin designers use this process because it adds clarity to their algorithms, but it takes a toll on the CPU and causes additional plugin delay.  

Below I will show a few examples of plugins creating different results in 96kHz and 44.1kHz sessions with the exact same plugin settings and gain staging.

Compression: Waves CLA-2A

10kHz sine wave generator -> CLA-2A -> Nugen Visualizer

Equalization: Universal Audio 88RS

10kHz sine wave generator -> UA 88RS -> Nugen Visualizer

Distortion: Soundtoys Decapitator

10kHz sine wave generator -> Decapitator -> Nugen Visualizer

Saturation: Plugin Alliance bx_saturator

10kHz sine wave generator -> bx_saturator -> Nugen Visualizer

As you can see, the 96kHz session plugins create more harmonics above the source signal and the 44.1kHz session plugins create some aliasing fold back and more distortion below the source signal.  Admittedly, all of the plugins are character style processors that add harmonics to the signal.  Cleaner plugins will not create nearly as many harmonics, nor will they create as much, or any fold back aliasing.     

If individual plugins are capable of creating harmonics, multiple plugins across an entire session will create a complex mix of harmonics with the potential for fold back aliasing.  Lets look at a real world example of two identical "in the box" mixes of the same song at 96kHz and 44.1kHz.  Look over the below block diagram to understand how these identical offline-bounce, "in the box" comparison mixes were created.  

Harmonic content generated by plugins in the 44.1kHz and 96kHz sessions:

The mixes for the 96kHz sessions show that the plugin processing created harmonics between 22kHz and 28kHz.  However, the 44.1kHz examples filtered away the harmonics and partially folded them back into the audible spectrum of the recording.  

Below is a stream of the phase inverted difference between the 96kHz session bounce and the 44.1kHz session bounce.  

When phase inverting the 96kHz and the 44.1kHz bounce (up sampled to 96kHz) we are listening to the differences between the files. What we hear may exist in either the 96Khz or 44.1kHz bounce since the remaining audio is not specific to one session or the other.  

These tests identify a few possible benefits of working at 96kHz.  First, the 96kHz session moves the Nyquist-Shannon frequency far above the hearing spectrum, reducing fold back aliasing and allowing for the creation clean harmonic content.  In the above phase inversion test you can clearly hear the aliasing on the open hi-hat.  Second, there is a distinct amount of high frequency detail that is prevalent in the 96kHz bounce that is not captured in the same way in the 44.1kHz bounce.  This high frequency detail can be heard in what remains of the vocal in the phase inversion test above.  Third, higher sample rates allow you to control transient detail with more precision and less distortion than at lower sample rates.  This is why we see oversampling features built into many popular digital mastering limiters.  

If you get a chance, play around with higher sample rates and let me know the differences that you hear.

96kHz & The Music Industry's Next Digital Supply Chain

Ryan Schwabe3 Comments

 

June 25th, 2016

Most modern songs are created in digital audio workstations that default to 24-bit wav file format and 44.1kHz sampling rate.  The 44.1kHz sampling rate has been the de facto standard for music distributors since the first commercial CD was released in August, 1982 by the Dutch technology company, Philips.  In 2016, 24-bit wav, 96kHz sampling rate is becoming the high resolution audio standard for the new music industry's digital supply chain.

44.1kHz sample rate was originally chosen for the CD because it is the minimum sampling rate necessary to satisfy the Nyquist – Shannon Theorem.  The Nyquist – Shannon Theorem states that in order to faithfully create a digitization of a sound, the sample rate must be twice that of the highest recorded frequency.  Technically, the human ear can hear frequencies up to 20kHz.  Therefore, the minimum sampling rate must be 40kHz in order to properly reconstruct the signal.  The incorrect reproduction of frequencies beyond the Nyquist Shannon Theorem is known as aliasing.  

 The red source signal requires 4 samples within the 2 wave cycles in order to properly capture the sound.  The blue line represents the aliasing created by the DAC when the sample rate is not twice that of the source.                                                        

The red source signal requires 4 samples within the 2 wave cycles in order to properly capture the sound.  The blue line represents the aliasing created by the DAC when the sample rate is not twice that of the source.                                                        

Since 1982, the music industry has delivered music to consumers using the 44.1kHz sampling rate.  However, the new streaming based digital supply chain is slowly adopting the 24-bit, 96kHz file format.  

 Digital distribution chain

Digital distribution chain

 Mastered for iTunes logo

Mastered for iTunes logo

In February of 2012, the Recording Academy and Apple iTunes worked together to create the “Mastered for iTunes “ digital delivery standard.  This standard is largely misunderstood, but creates a method for the mastering engineer to compare what he or she hears in the studio with what the consumer will hear.  The MfiT standard also protects against peak distortion that can be created during the format conversion process.  A common approach to protecting against peak distortion during the conversion process is to create -1.5 to -0.5dBFS of unused headroom in the master digital audio file, creating headroom in the top of the master.  If your limiter is set to a maximum output level of -0.1dBFS, or even -0.3dBFS, peak distortion can be created in the consumer file when your file is converted form a wav file to a consumer file format.  By leaving at least -0.5dBFS of headroom the encoding process will stay within full scale (0.0dBFS), reducing the chance of peak distortion.  The MfiT applet allows you to perform the conversion process and hear the AAC file before it hits retail.  

The below picture shows a wav file with a limiter's output set to a maximum loudness of -.5dBFS.  When the master file is encoded to an MP3 or AAC by the retailer, the codec will encode overs above your limiter level.  If you limiter is set with some headroom the encoded peaks will not result in distortion.  It will simply take advantage of the headroom you left in the master.   

 

 Peak distortion created during the format conversion process performed by digital music retailers.  The above photo shows amplitude (up, down) and time (L, R) .

Peak distortion created during the format conversion process performed by digital music retailers.  The above photo shows amplitude (up, down) and time (L, R) .

The MfiT protocol prefers 24-bit wav, 96kHz sample rate files for AAC encoding.  Technically, you can deliver a 24-bit wav, 44.1kHz file to your distributor and it will still be considered "Mastered for iTunes", but 24-bit 96kHz files are preferred.  In my opinion, the MfiT guidelines work extremely well across the entire digital supply chain, not just the iTunes marketplace.

 High Resolution Audio Logo created by the Consumer Technology Association

High Resolution Audio Logo created by the Consumer Technology Association

 Master Quality Authenticated 

Master Quality Authenticated 

In February of 2016, The Consumer Technology Association created a classification for “High Resolution Audio” as “better than CD quality”. In addition to High Resolution Audio standards, streaming services are slowly moving to High Resolution Audio with the incorporation of “Master Quality Authenticated” encoding and decoding technology developed by Bob Stuart of Meridian Audio.    

The MQA process allows for the encoding and decoding of 96kHz, 24-bit files by streaming services, but at a fraction of the file size.  Tidal has adopted the technology and other streaming services are showing interest in Meridian's breakthroughs.  MQA audio streaming will require a hardware decoder to playback the full bandwidth 96kHz, 24-bit stream. However, normal playback devices such as an iPhone or laptop will support "CD quality" MQA streams without an MQA decoder.  

As you can see, the largest supplier of music (iTunes) has incorporated a high resolution audio as the archival standard with it's “Mastered for iTunes“ program.   Apple is currently amassing the largest database of 24-bit, 96kHz music in the world.  The Consumer Technology Association has designated a minimum standard and logo for High Resolution Audio and they plan on licensing the logo to appear dynamically within streaming services.  

As streaming services continue to innovate we will hear higher quality audio and greater integration of metadata delivered to consumers.  The Digital Data Exchange (DDEX) worked with the Recording Academy to set standards for the formatting metadata that will travel down the digital supply chain to digital distributors.  Once metadata is integrated into the digital supply chain it will change the way we discover new music and learn about the people who make it. It will not be long before there will be a high resolution audio streaming service with a fully integrated digital credits list allowing consumers to discover new music in a whole new way.