Ryan Schwabe

Mixing and Mastering at 96kHz

Ryan Schwabe26 Comments

We all use audio plugins to massage and mangle our recordings.  Compression, EQ, distortion, time-based effects, modulation, hardware emulations; we push plugins to the limit to help us create new and unique sounds.  However, plugins must work within the harmonic limitations of the sampling rate set by the digital audio workstation.  We may not be able to hear above 20kHz, but analog electronics and modern plugins create harmonics above our hearing range that affect the sounds we hear.  If a plugin generates harmonics higher than the Nyquist-Shannon frequency limit of the digital audio system, aliasing artifacts are partially folded back into the audible spectrum. 

Take a look at the below drawing showing the fundamental tone f/0 in green.  When the tone is distorted, the 2nd, and 3rd harmonics are created.  As you can see, the anti-aliasing filter is not ideal. At 44.1kHz sampling rate the 3rd harmonic is below the anti-aliasing cutoff filter, but above the Nyquist-Shannon frequency.  Because the anti-aliasing filter is imperfect and does not filter the 3rd harmonic, it is folded back into the audible spectrum of the signal.  This is known as aliasing fold back.   

Distortion content exceeds the Nyquist-Shannon frequency and is folded back into the audible spectrum.  Aliasing  fold back  is dependent on the type of distortion applied.  The above harmonic distortions and aliasing are simplified for clarity.  In reality the interactions are much more complex.

Aliasing fold back (or distortion) at low sample rates is more prevalent with plugins that generate a lot of harmonics such as compression, distortion or colorful EQ’s.  Many plugin manufactures use oversampling in order to better manage harmonic content created by the algorithm.  The background oversampling process up-samples the signal by 2x, 4x, 8x, or 16x, performs the processing, filters out the harmonics and then down-samples to the host-sampling rate.  The oversampling process moves the Nyquist-Shannon filter far beyond the human hearing range, reducing the chance of fold back aliasing.  Plugin designers use this process because it adds clarity to their algorithms, but it takes a toll on the CPU and causes additional plugin delay.  

Below I will show a few examples of plugins creating different results in 96kHz and 44.1kHz sessions with the exact same plugin settings and gain staging.

Compression: Waves CLA-2A

10kHz sine wave generator -> CLA-2A -> Nugen Visualizer

Equalization: Universal Audio 88RS

10kHz sine wave generator -> UA 88RS -> Nugen Visualizer

Distortion: Soundtoys Decapitator

10kHz sine wave generator -> Decapitator -> Nugen Visualizer

Saturation: Plugin Alliance bx_saturator

10kHz sine wave generator -> bx_saturator -> Nugen Visualizer

As you can see, the 96kHz session plugins create more harmonics above the source signal and the 44.1kHz session plugins create some aliasing fold back and more distortion below the source signal.  Admittedly, all of the plugins are character style processors that add harmonics to the signal.  Cleaner plugins will not create nearly as many harmonics, nor will they create as much, or any fold back aliasing.     

If individual plugins are capable of creating harmonics, multiple plugins across an entire session will create a complex mix of harmonics with the potential for fold back aliasing.  Lets look at a real world example of two identical "in the box" mixes of the same song at 96kHz and 44.1kHz.  Look over the below block diagram to understand how these identical offline-bounce, "in the box" comparison mixes were created.  

Harmonic content generated by plugins in the 44.1kHz and 96kHz sessions:

The mixes for the 96kHz sessions show that the plugin processing created harmonics between 22kHz and 28kHz.  However, the 44.1kHz examples filtered away the harmonics and partially folded them back into the audible spectrum of the recording.  

Below is a stream of the phase inverted difference between the 96kHz session bounce and the 44.1kHz session bounce.  

When phase inverting the 96kHz and the 44.1kHz bounce (up sampled to 96kHz) we are listening to the differences between the files. What we hear may exist in either the 96Khz or 44.1kHz bounce since the remaining audio is not specific to one session or the other.  

These tests identify a few possible benefits of working at 96kHz.  First, the 96kHz session moves the Nyquist-Shannon frequency far above the hearing spectrum, reducing fold back aliasing and allowing for the creation clean harmonic content.  In the above phase inversion test you can clearly hear the aliasing on the open hi-hat.  Second, there is a distinct amount of high frequency detail that is prevalent in the 96kHz bounce that is not captured in the same way in the 44.1kHz bounce.  This high frequency detail can be heard in what remains of the vocal in the phase inversion test above.  Third, higher sample rates allow you to control transient detail with more precision and less distortion than at lower sample rates.  This is why we see oversampling features built into many popular digital mastering limiters.  

If you get a chance, play around with higher sample rates and let me know the differences that you hear.