Ryan Schwabe

Mixing and Mastering at 96kHz

Ryan Schwabe

We all use audio plugins to massage and mangle our recordings.  Compression, EQ, distortion, time-based effects, modulation, hardware emulations; we push plugins to the limit to help us create new and unique sounds.  However, plugins must work within the harmonic limitations of the sampling rate set by the digital audio workstation.  We may not be able to hear above 20kHz, but analog electronics and modern plugins create harmonics above our hearing range that affect the sounds we hear.  If a plugin generates harmonics higher than the Nyquist-Shannon frequency limit of the digital audio system, aliasing artifacts are partially folded back into the audible spectrum. 

Take a look at the below drawing showing the fundamental tone f/0 in green.  When the tone is distorted, the 2nd, and 3rd harmonics are created.  As you can see, the anti-aliasing filter is not ideal. At 44.1kHz sampling rate the 3rd harmonic is below the anti-aliasing cutoff filter, but above the Nyquist-Shannon frequency.  Because the anti-aliasing filter is imperfect and does not filter the 3rd harmonic, it is folded back into the audible spectrum of the signal.  This is known as aliasing fold back.   

Distortion content exceeds the Nyquist-Shannon frequency and is folded back into the audible spectrum.  Aliasing  fold back  is dependent on the type of distortion applied.  The above harmonic distortions and aliasing are simplified for clarity.  In reality the interactions are much more complex.

Aliasing fold back (or distortion) at low sample rates is more prevalent with plugins that generate a lot of harmonics such as compression, distortion or colorful EQ’s.  Many plugin manufactures use oversampling in order to better manage harmonic content created by the algorithm.  The background oversampling process up-samples the signal by 2x, 4x, 8x, or 16x, performs the processing, filters out the harmonics and then down-samples to the host-sampling rate.  The oversampling process moves the Nyquist-Shannon filter far beyond the human hearing range, reducing the chance of fold back aliasing.  Plugin designers use this process because it adds clarity to their algorithms, but it takes a toll on the CPU and causes additional plugin delay.  

Below I will show a few examples of plugins creating different results in 96kHz and 44.1kHz sessions with the exact same plugin settings and gain staging.

Compression: Waves CLA-2A

10kHz sine wave generator -> CLA-2A -> Nugen Visualizer

Equalization: Universal Audio 88RS

10kHz sine wave generator -> UA 88RS -> Nugen Visualizer

Distortion: Soundtoys Decapitator

10kHz sine wave generator -> Decapitator -> Nugen Visualizer

Saturation: Plugin Alliance bx_saturator

10kHz sine wave generator -> bx_saturator -> Nugen Visualizer

As you can see, the 96kHz session plugins create more harmonics above the source signal and the 44.1kHz session plugins create some aliasing fold back and more distortion below the source signal.  Admittedly, all of the plugins are character style processors that add harmonics to the signal.  Cleaner plugins will not create nearly as many harmonics, nor will they create as much fold back aliasing.     

If individual plugins are capable of creating harmonics, multiple plugins across an entire session will create a complex mix of harmonics with the potential for fold back aliasing.  Lets look at a real world example of two identical "in the box" mixes of the same song at 96kHz and 44.1kHz.  Look over the below block diagram to understand how these identical offline-bounce, "in the box" comparison mixes were created.  

Harmonic content generated by plugins in the 44.1kHz and 96kHz sessions:

The mixes for the 96kHz sessions show that the plugin processing created harmonics between 22kHz and 28kHz.  However, the 44.1kHz examples filtered away the harmonics and partially folded them back into the audible spectrum of the recording.  

Below is a stream of the phase inverted difference between the 96kHz session bounce and the 44.1kHz session bounce.  

When phase inverting the 96kHz and the 44.1kHz bounce (up sampled to 96kHz) we are listening to the differences between the files. What we hear may exist in either the 96Khz or 44.1kHz bounce since the remaining audio is not specific to one session or the other.  

These tests identify a few possible benefits of working at 96kHz.  First, the 96kHz session moves the Nyquist-Shannon frequency far above the hearing spectrum, reducing fold back aliasing and allowing for the creation of additional harmonic content.  In the above phase inversion test you can clearly hear the aliasing on the open hi-hat.  Second, there is a distinct amount of high frequency detail that is prevalent in the 96kHz bounce that is not captured in the same way in the 44.1kHz bounce.  This high frequency detail can be heard in what remains of the vocal in the phase inversion test above.  Third, higher sample rates allow you to control transient detail with more precision and less distortion than at lower sample rates.  This is why we see oversampling features built into many popular digital mastering limiters.  

If you get a chance, play around with higher sample rates and let me know the differences that you hear.

96kHz & The Music Industry's Next Digital Supply Chain

Ryan Schwabe

 

June 25th, 2016

Most modern songs are created in digital audio workstations that default to 24-bit wav file format and 44.1kHz sampling rate.  The 44.1kHz sampling rate has been the de facto standard for music distributors since the first commercial CD was released in August, 1982 by the Dutch technology company, Philips.  In 2016, 24-bit wav, 96kHz sampling rate is becoming the high resolution audio standard for the new music industry's digital supply chain.

44.1kHz sample rate was originally chosen for the CD because it is the minimum sampling rate necessary to satisfy the Nyquist – Shannon Theorem.  The Nyquist – Shannon Theorem states that in order to faithfully create a digitization of a sound, the sample rate must be twice that of the highest recorded frequency.  Technically, the human ear can hear frequencies up to 20kHz.  Therefore, the minimum sampling rate must be 40kHz in order to properly reconstruct the signal.  In addition, a steep low pass filter must be placed at 20kHz in order to remove frequencies that a digital to analog converter will incorrectly reproduce.  The incorrect reproduction of frequencies beyond the Nyquist Shannon Theorem is known as aliasing.  Anti-aliasing low pass filters for 44.1kHz converters have to be applied very close to the top end of the human hearing range (20-22kHz).  Keep in mind that for 96kHz files, anti-aliasing filters are set at 48kHz, far above the limits of human hearing.  For reference, aliasing sounds like gritty resonance or ringing in high frequencies.  

The red source signal requires 4 samples within the 2 wave cycles in order to properly capture the sound.  The blue line represents the aliasing created by the DAC when the sample rate is not twice that of the source.                                                        

The red source signal requires 4 samples within the 2 wave cycles in order to properly capture the sound.  The blue line represents the aliasing created by the DAC when the sample rate is not twice that of the source.                                                        

Since 1982, the music industry has delivered music to consumers using the 44.1kHz sampling rate.  However, the new streaming based digital supply chain is slowly adopting the 24-bit, 96kHz file format. 

Digital distribution chain

Digital distribution chain

Mastered for iTunes logo

Mastered for iTunes logo

In February of 2012, the Recording Academy and Apple iTunes (The largest digital retailer of music at the time) worked together to create the “Mastered for iTunes “ digital delivery standard.  This standard is largely misunderstood, but creates a method for the mastering engineer to compare what he or she hears in the studio with what the consumer will hear.  The MfiT standard also protects against peak distortion that can be created during the format conversion process of streaming services and digital retailers like iTunes or Amazon MP3 (WAV -> AAC or MP3).  A common approach to protecting against peak distortion during the conversion process is to create 0.5dBFS of unused headroom in the master digital audio file.  Meaning, the final limiter peak output level must be set to -0.5 dBFS or lower (dependent on program material), allowing silence to be printed into the loudest portion of the master.  The headroom is used to protect against peak distortion that can be encoded into the consumer files created by digital retailers.  If your limiter is set to a maximum output level of -0.1dBFS, or even -0.3dBFS, peak distortion can be created in the consumer file.  By leaving at least -0.5dBFS of headroom in the final master, most digital retailers encoding software will stay within the maximum loudness (0.0dBFS) of the encoded file, reducing the chance of peak distortion.  The MfiT applet allows you to perform the conversion process and hear the AAC file before it hits retail.  It also includes a terminal command that lets you determine the number of peak distortions that exist in the consumer file.

The below picture shows a wav file with a limiter's output set to a maximum loudness of -.5dBFS.  When the master file is encoded to an MP3 or AAC by the retailer, the codec will create new peaks above the maximum loudness of -0.5dBFS.  However, the newly created peaks do not result in distortion because the codec is printing into the remaining headroom of the master file.   

Peak distortion created during the format conversion process performed by digital music retailers.  The above photo shows amplitude (up, down) and time (L, R) .

Peak distortion created during the format conversion process performed by digital music retailers.  The above photo shows amplitude (up, down) and time (L, R) .

The MfiT protocol prefers 24-bit wav, 96kHz sample rate files for AAC encoding.  Technically, you can deliver a 24-bit wav, 44.1kHz file to your distributor and it will still be considered "Mastered for iTunes", but 24-bit 96kHz files allow their encoders to create more accurate encodes.  In my opinion, the MfiT guidelines work extremely well across the entire digital supply chain, not just the iTunes marketplace.

The MfiT program does not recommend simply up-sampling a 44.1kHz or 48kHz final master.  However, it does not mention whether one should mix and master at a higher resolution than the source files.  I have found advantages to the way plugin algorithms perform at 96kHz and will go further into this topic in my next post. Tom Volpicelli of The Mastering House touches on the subject here.     

High Resolution Audio Logo created by the Consumer Technology Association

High Resolution Audio Logo created by the Consumer Technology Association

In February of 2016, The Consumer Technology Association created a classification for “High Resolution Audio” as “better than CD quality”.  Therefore, a minimum of 20-bit, 48kHz sampling rate is now considered “High Resolution Audio”.  You may ask, why 20-bit and not 24-bit resolution?  20-bit was chosen because of the vast archives of classic songs recorded to 20-bit, 48kHz digital tape in the early 80s.  If 24-bit was chosen as the minimum standard, then a decade worth of early digital recordings would not classify as "better than CD quality".   Technically, these files are higher fidelity than "CD quality."

The  High Resolution Audio logo will be available for streaming services in late 2016.  It is expected that streaming services will turn the logo on if a song was submitted to the digital supply chain at "better than CD quality" and turn the logo off if the file was submitted at 16-bit, 44.1kHz (CD Quality).  

Master Quality Authenticated 

Master Quality Authenticated 

In addition to High Resolution Audio standards, streaming services are slowly moving to High Resolution Audio with the incorporation of “Master Quality Authenticated” encoding and decoding process developed by Bob Stuart of Meridian Audio.    

The MQA process allows for the encoding and decoding of 96kHz, 24-bit files by streaming services, but at a fraction of the file size.  Tidal is expected to adopt the technology by the end of 2016 and other streaming services are showing interest in Meridian's breakthroughs.  MQA audio streaming will require a hardware decoder to playback the full bandwidth 96kHz, 24-bit stream. However, normal playback devices such as an iPhone or laptop will inherently support "CD quality" MQA streams without an MQA decoder.  According to the limited info that has been released, MQA will be capable of delivering CD quality audio using roughly the same bandwidth currently used by Apple and Spotify.  For an explanation of how it works, read this article or AES Journal #9178.   Rumor has it that Meridian has also built a software decoder that it will decode an MQA stream into a true 96kHz, 24-bit stream.  At the time that this article was written Apple is considering purchasing Tidal.

As you can see, the largest supplier of music (iTunes) has incorporated a high resolution audio as the archival standard with it's “Mastered for iTunes“ program.   Apple is currently amassing the largest database of 24-bit, 96kHz music in the world.  The Consumer Technology Association has designated a minimum standard and logo for High Resolution Audio and they are in the process of licensing the logo to appear dynamically within streaming services.  And finally, it appears that Master Quality Authenticated will be incorporated into a modern streaming service in the near future, delivering 96kHz, 24-bit quality digital audio to the world. 

While many modern productions are still being created at 44.1kHz or 48kHz, digital distribution services are moving to 24-bit, 96kHz as the High Resolution Audio standard.

This made me to two questions:

1) Does the up-sampling process negatively affect the quality of the files?

2) What, if any, are the benefits of up-sampling 44.1kHz files to 96kHz for mixing and mastering? (This question will be answered in my next blog post)

Below I will do a quick test to look into the effects of modern sample rate conversion.  My hypothesis is that up-sampling does not negatively affect the sound quality of your 44.1kHz or 48kHz audio, it only moves the aliasing filter out of our hearing range and raises the precision of data used by your digital audio plugins. 

Does sample rate conversion alter the sound of digital audio files?

To test the effects of sample rate conversion I will take a 24-bit, 44.1kHz wav file called “Source” and put it in a 24-bit, 44.1khz Pro Tools, 12.5 session.  I will bounce that single stereo 24-bit 44.1kHz wav file down to a 24-bit, 44.1kHz wav file and save it as “CD SR Quality” file.  This bounce will not include any plugin processing or volume manipulation.  Then, I will take the same “Source” file and import (up-sample) it into a 24-bit, 96kHz, Pro Tools 12.5 session.  I will 'bounce to disc' to a 24-bit, 96kHz sample rate file and label it “HD SR Quality”.  This bounce will not include any plugin processing or volume manipulation, either.

Essentially, I have two identical files bounced from the same source audio file, but at different sample rates.  The “CD SR Quality” file was imported and bounced to disc at the source sample rate (44.1kHz) and “HD SR Quality” was up-sampled and bounced to disc at 96kHz.  I then imported the “CD SR Quality” and “HD SR Quality” files into a 44.1kHz session.  The “CD SR Quality” file did not go through any sample rate conversion,  but the “HD SR Quality” went through a second sample rate conversion from 96kHz back down to 44.1kHz.  We are testing the artifacts that are created when up-sampling a 44.1kHz wav file to a 96kHz session and then reducing the sample rate back down to 44.1kHz.  

Now that both the “CD SR Quality” and the “HD SR Quality” files are in the same 44.1kHz session, we can phase invert one of the files, sum them and see what remains.  If any audio remains, it will be the result of the  sample rate conversion process. 

High frequency noise at 22Kz, -107dBFS

High frequency noise at 22Kz, -107dBFS

As you can see, the remaining audio is an extremely low amplitude, inaudible sound (-107dBFS) around 22kHz.  Through phase inversion amplitude testing I was able to determine that the 22kHz high frequency sound was in the “CD SR Quality” file, not the “HD SR Quality” file.  The 22kHz sound is NOT in the “HD SR Quality” file because it went through a second conversion process from 96kHz to 44.1kHz and passed through the anti-alias filter, slightly reducing the high frequencies.  The below photo shows an "ideal" filter at 22kHz (white line), an actual filter (blue to red line), the filtered audio within the human hearing range (red highlight) and the frequencies that will contribute to aliasing (orange).      

White Line = "ideal" filter at 22kHz, Blue to Red Line = Actual filter, Red Highlight = the filtered audio within the human hearing range, Orange Highlight = frequencies that will contribute to aliasing 

White Line = "ideal" filter at 22kHz, Blue to Red Line = Actual filter, Red Highlight = the filtered audio within the human hearing range, Orange Highlight = frequencies that will contribute to aliasing 

It appears that the up-sampling process has near-zero effect on the amplitude and frequency distribution of the sound file.  The up-sampling process simply increases the number of data points in the audio file, does not increase or decrease precision in the file and moves the effects of the anti-aliasing filter further above our hearing range.  The down sampling anti-aliasing filter slightly reduces 22kHz (shown above in red).

So, why should we up-sample 44.1kHz or 48kHz source files for mixing and mastering if the the up-sampled 96kHz "HD SR Quality" file is nearly identical to the source?

More on that later.