Ryan Schwabe

Stem Mastering: When & How to Go Beyond Conventional Stereo Mastering

Ryan SchwabeComment

This article was originally published on Sonicscoop.com.  It can be accessed here

As an artist, producing or engineering your own music can feel very liberating.  It allows you to craft a unique sonic signature for your record without relying on anyone else’s vision.

With a little bit of practice, creating unique tones and productions will come easy to many young producers. But crafting the perfect overall balancethe depth, dynamics and dimension that you hear in a well-engineered song can be more difficult.

Some productions are arranged so well that the engineering details fall into place without much fuss. However, other songs are more difficult, and it can be frustrating to find the perfect balance. The issue can be in your monitoring environment, your speakers, the arrangement, your engineering skillset, or simply how you hear.

Conventional audio mastering from a 2-track stereo file can fix a lot of overall balance issues within a mix, but it does not afford a mastering engineer the freedom and flexibility that “stem mastering” can provide.

When warranted, stem mastering allows the mastering engineer to adjust how the kick and bass interact, how the guitars or keyboards blend into the bass, how the snare drum bounces around the vocal or how bright or dull the vocal is in relation to other elements in the mix.

Of course, those elements can be subtly rebalanced in conventional stereo mastering by using multiband compressors, dynamic EQs or mid-side processing, but you can do it with much more precision—and with much less compromise—through stem mastering, a process where the mastering engineer is given access to submixes, or even essential individual instruments.

Not all songs need stem mastering, but if you are in a situation where you are not completely convinced by the sonics of your record, stem mastering can be a simple and cost-effective solution.

In recent years, I have noticed an increasing number of modern artists crafting their productions and mixes on their own, but using stem mastering to put the finishing touches on their projects, rather than starting from scratch with a complete re-mix.

This happy compromise that still affords self-producing artists the freedom to craft the song exactly how they want, without the complexities and cost of hiring a more experienced mixer, while giving the mastering engineer the ability to help craft the spectral balance of the song by processing individual elements.

If you find yourself in that situation where you are determined to mix your own music, but not 100% sold on the sound of your record, stem mastering may be your answer.  If you decide to go this route, below are a handful of quick tips and best practices for printing stems that sound exactly like your final mix.



1) Let’s talk about file formats.

It’s a good idea to print stems at the sample rate of your original session using 32-bit floating point WAV files. Printing at 32-bit float will protect you from any overs that may take place in the session from improper gain staging.

32-bit float files allow you to go over 0.0 dBFS without any hard clipping being printed into the file. The mastering engineer can then attenuate all of the stem files and bring the peak level back below full scale (0.0 dBFS).

2) Let’s talk about the best way to break your tracks up.  

There are a variety of ways to submit stems, from the very simplest option of sending an instrumental mix and an a cappella mix for help with vocal balance to sending separate stereo prints for almost every instrument grouping.

If you are interested in having the mastering engineer get under the hood of your song, I suggest the below track groupings because they provide for the most amount of flexibility while still keeping things simple and manageable:

Stem 1: Kick
Stem 2: Snare
Stem 3: All remaining drums (hats, crashes, toms etc)
Stem 4: Bass
Stem 5: “Music A” (legato or rhythm type instruments)
Stem 5: “Music B” (staccato or melodic type instruments)
Stem 6: Lead Vocals
Stem 7: Background vocals
Stem 8: Complete stereo reference mix

Each stem should be printed complete with all effects processing, including any compression, EQ, time based effects, like reverb or delay, automation, and any bus processing.

(If time based effects tracks are provided separately its difficult to make a instrumental version without vocal reverb included in the instrumental.)

If you are printing multi-mic acoustic recordings, make sure that all those tracks are properly time and phase aligned.

3) Let’s talk about all those plugs you put on your master fader.  

Stereo processing is used (and often abused) in modern music production, and removing it can significantly alter your mix.

If you think it sounds good, then it is good and should remain.  Personally, I love the weird processing people come up with to mangle their music.  However, I generally suggest removing brick wall limiters from the stereo mix prior to mastering.

EQs, compressors or any other effects processors can and should remain.  If the processors are not at all important to your mix and were just added as an afterthought for reference, you can simply remove them and provide a reference mix for your mastering engineer.

Keep in mind that printing individual stems through your master fader will yield different results because your master fader compressor’s threshold is set for the entire song, not individual elements in the arrangement. So this is one place where the individual elements can sound different than the sum of their parts.

If you want to be really certain that the sound of the compressor on the stereo mix stays consistent when you start printing stems, consider adding in the following to your stem-printing process:

  1. Bypass the stereo mix compressor and every plugin after it on the stereo mix
  2. Bounce to disc and name the file “Mix Key”
  3. Import the “Mix Key” back into your session, make sure it is time aligned with your song and then route the output of the key mix to a unused bus
  4. Reactivate your stereo mix compressor and send the “Mix Key” bus to the key input of your stereo mix compressor.

This simple extra step will allow you to print stems while maintaining the same gain reduction from when the compressor was being fed the entire mix.

For even more details on the process, please refer to the page 20 of the GRAMMY Producers and Engineers Wing’s document on “Delivery Recommendations for Recorded Music Projects”.

4) OK, now lets actually make the stems.

First, you should organize your session into a logical order.  Drums, then bass, then guitars and keyboards and then vocals.

Mute all of the source tracks (but not the effects returns or busses) and then un-mute each of the track groupings one at a time and bounce them to disk.

Start with un-muting the kick, then bounce from 0 to the end of the song.  This should include all track processing, buss processing, time based effects and stereo master fader processing.  Then, mute this track and move on to your next stem.

One word of caution: If you are using pre-fader sends for any effects sends in your session, then be sure to bypass the effects send as well, as it will still come through even if that track is muted.

As an alternate approach, you can start by making all your source tracks inactive, and then only reactivate those that you want in the current stem. Either way, make sure that all of these stems have the same exact start point. If you are bouncing off line and not able to monitor the stems as they bounce down you may make some mistakes along the way. That is what brings us to the final step.

5) Last thing: Test your stems

This is by far the most overlooked and important part of preparing stems.

Create a blank session in your DAW choice.  Import the stems on their own tracks with the faders at unity.  Does the mix sound exactly the same as you remember when the faders are at unity?   If so, congrats you did it!  If not, go back into your mix session and reprint any the stems that may have turned out wrong.

The above process may seem like a lot of work, but it is important work. Regardless of whether you plan on working with a stem mastering engineer, a stereo mastering engineer or going  DIY, I strongly recommend that you incorporate this process at the end of every project that you complete and release.

Creating stems may not be the sexiest part of being a producer, but as a producer, it is extremely important to future-proof your intellectual property.

Operating systems go out of date, plugin formats change and in as little as three years from now you may not be able to play the DAW sessions that you create today. By stemming your songs, you are preserving your work for future use.

Five years from now, someone may want an edited arrangement of your song for a commercial or film. A label may want to release your project in a new VR format. We simply have no idea how we will consume music ten, fifteen or twenty years from today.

Stems are an integral part of music preservation, regardless of your interest in using stems in the mastering process. The songs you conjure up in your daw deserve to be preserved beyond the delivery formats we are accustomed to today. Stem your songs out, save’em and keep your catalog ready for whatevers next.

Music Streaming & Loudness Normalization

Ryan Schwabe8 Comments
Screen Shot 2017-09-24 at 11.37.25 PM.png

Streaming services use volume normalization to create a balanced listening experience across playlists and albums. Services like Spotify, Tidal and Apple Music determine an average loudness value for singles, EPs & LPs using a loudness measurement called LUFS.  The song, EP or LP's loudness value is used to normalize playback volume to a target level set by the streaming service.  The Audio Engineering Society suggests a streaming target level of -16LUFS, however, most streaming services use a louder target level between -13 and -16LUFS. Streaming service's target levels are much lower in volume than master levels preferred by modern artists, producers and engineers.  Because of the difference between master levels and streaming service target levels, the louder a master recording is made, the more streaming services will turn down the recording to match their target level.  For example, if you master an album to -8LUFS (loud) and submit the files to Spotify, they will turn down the songs playback volume 6dB to match their target volume of -14LUFS.  

Streaming Service Target Volumes:

  • Apple Music (soundcheck on):
    •  -16LUFS
  • Spotify:  
    • -14LUFS
  • Tidal:
    • -14LUFS
  • YouTube:
    • -13LUFS

Mastering Levels and Streaming Service Target Volumes:

Below are five different masters of a single song at a different loudness levels (-8LUFS, -10LUFS, -12LUFS, -14LUFS and -16LUFS).  The target playback level in the below example is -14LUFS (Spotify & Tidal).  Loud master recordings (pink, orange, yellow) are turned down to the streaming service's target volume.  Lower level masters are not turned down as much and provide for a greater peak to loudness ratio than albums that are mastered at loud volumes.  in effect, the louder you master your album, the lower your peak to loudness ratio.     

 The -8LUFS master (pink) is turned down 6dB, -10LUFS is turned down 4dB, -12LUFS is turned down 2dB, -14LUFS file is uneffected and the -16LUFS file is amplified by 2dB, potentially approaching the service's playback limiter.

The -8LUFS master (pink) is turned down 6dB, -10LUFS is turned down 4dB, -12LUFS is turned down 2dB, -14LUFS file is uneffected and the -16LUFS file is amplified by 2dB, potentially approaching the service's playback limiter.

Test Files Submitted to Streaming Services:

To illustrate the playback volume manipulation performed by streaming services I have submitted test files of master levels to streaming services.  Each file consists of an identical sequences of pink noise calibrated to specific loudness levels. The five songs were submitted as "singles" to streaming services so that each track's volume is assessed individually, and not as an average for the entire EP or LP.  Some streaming services have an  "album mode" which normalizes the entire album's average volume to the streaming service's target volume and maintains the individual level differences between tracks set by the mastering engineer.  The below test files were submitted as singles to avoid the album mode loudness averaging. This simulates what a song would do when it is added to a playlist. You can download the 16 bit, 44.1kHz test files below and the AAC files here.     

Test File Info:

  • "8 Times"      -8LUFS,  -1.9dBTP 
  • "10 Shoes"   -10LUFS,  -4dBTP 
  • "12 Dozen"   -12LUFS,  -5.9dBTP 
  • "14 Team"     -14LUFS,  -7.9dBTP 
  • "16 Ounces" -16LUFS,  -9.9dBTP 

Streaming Services & Normalized Playback Volume:

Click the below links to open in-app playlists of the above test files.

The track "8 Times" is mastered 8 dB louder than "16 Ounces", but both tracks play back at a very similar perceived volume on all Spotify, Tidal and Apple Music. 


As you can hear in the above playlists, louder masters do not create a louder playback experience for the listener.  The use of playback normalization algorithms eliminate the need for projects to be mastered at extremely high levels as they were in the early aughts.  Songs mastered at different volume levels are streamed at almost identical playback levels. Even though each streaming services has a different approach to loudness normalization, they all use a target level far below the master volume preferred by many modern artists, producers & engineers.  By mastering records closer to streaming service's target playback level, you will achieve a similar perceived playback volume, but gain the benefit of additional transient detail in the lower level master. As you can see in the below example, a song mastered closer to the playback level of a streaming service will provide additional transient detail (peak to loudness ratio) over a loud master, but will stream at a similiar playback level.  Obviously, music is not made by measurement and some forms of music simply sound better with more compression and limiting in the master recordings, while other styles of music will benefit from a more gentle approach.  You should work with your mastering engineer to determine an appropriate target level that suits your particular project and genre.

 Both files will play back with the same perceived volume on streaming services, but the lower level master will take advantage of a higher peak to loudness ratio

Both files will play back with the same perceived volume on streaming services, but the lower level master will take advantage of a higher peak to loudness ratio


Unfortunately, not all platforms have adopted playback normalization into their listening experience.  Soundcloud and Bandcamp do not perform volume normalization.  Soundcloud is said to have plans to adopt loudness normalization, but not Bandcamp.



Mixing and Mastering at 96kHz

Ryan Schwabe17 Comments

We all use audio plugins to massage and mangle our recordings.  Compression, EQ, distortion, time-based effects, modulation, hardware emulations; we push plugins to the limit to help us create new and unique sounds.  However, plugins must work within the harmonic limitations of the sampling rate set by the digital audio workstation.  We may not be able to hear above 20kHz, but analog electronics and modern plugins create harmonics above our hearing range that affect the sounds we hear.  If a plugin generates harmonics higher than the Nyquist-Shannon frequency limit of the digital audio system, aliasing artifacts are partially folded back into the audible spectrum. 

Take a look at the below drawing showing the fundamental tone f/0 in green.  When the tone is distorted, the 2nd, and 3rd harmonics are created.  As you can see, the anti-aliasing filter is not ideal. At 44.1kHz sampling rate the 3rd harmonic is below the anti-aliasing cutoff filter, but above the Nyquist-Shannon frequency.  Because the anti-aliasing filter is imperfect and does not filter the 3rd harmonic, it is folded back into the audible spectrum of the signal.  This is known as aliasing fold back.   

Distortion content exceeds the Nyquist-Shannon frequency and is folded back into the audible spectrum.  Aliasing  fold back  is dependent on the type of distortion applied.  The above harmonic distortions and aliasing are simplified for clarity.  In reality the interactions are much more complex.

Aliasing fold back (or distortion) at low sample rates is more prevalent with plugins that generate a lot of harmonics such as compression, distortion or colorful EQ’s.  Many plugin manufactures use oversampling in order to better manage harmonic content created by the algorithm.  The background oversampling process up-samples the signal by 2x, 4x, 8x, or 16x, performs the processing, filters out the harmonics and then down-samples to the host-sampling rate.  The oversampling process moves the Nyquist-Shannon filter far beyond the human hearing range, reducing the chance of fold back aliasing.  Plugin designers use this process because it adds clarity to their algorithms, but it takes a toll on the CPU and causes additional plugin delay.  

Below I will show a few examples of plugins creating different results in 96kHz and 44.1kHz sessions with the exact same plugin settings and gain staging.

Compression: Waves CLA-2A

10kHz sine wave generator -> CLA-2A -> Nugen Visualizer

Equalization: Universal Audio 88RS

10kHz sine wave generator -> UA 88RS -> Nugen Visualizer

Distortion: Soundtoys Decapitator

10kHz sine wave generator -> Decapitator -> Nugen Visualizer

Saturation: Plugin Alliance bx_saturator

10kHz sine wave generator -> bx_saturator -> Nugen Visualizer

As you can see, the 96kHz session plugins create more harmonics above the source signal and the 44.1kHz session plugins create some aliasing fold back and more distortion below the source signal.  Admittedly, all of the plugins are character style processors that add harmonics to the signal.  Cleaner plugins will not create nearly as many harmonics, nor will they create as much, or any fold back aliasing.     

If individual plugins are capable of creating harmonics, multiple plugins across an entire session will create a complex mix of harmonics with the potential for fold back aliasing.  Lets look at a real world example of two identical "in the box" mixes of the same song at 96kHz and 44.1kHz.  Look over the below block diagram to understand how these identical offline-bounce, "in the box" comparison mixes were created.  

Harmonic content generated by plugins in the 44.1kHz and 96kHz sessions:

The mixes for the 96kHz sessions show that the plugin processing created harmonics between 22kHz and 28kHz.  However, the 44.1kHz examples filtered away the harmonics and partially folded them back into the audible spectrum of the recording.  

Below is a stream of the phase inverted difference between the 96kHz session bounce and the 44.1kHz session bounce.  

When phase inverting the 96kHz and the 44.1kHz bounce (up sampled to 96kHz) we are listening to the differences between the files. What we hear may exist in either the 96Khz or 44.1kHz bounce since the remaining audio is not specific to one session or the other.  

These tests identify a few possible benefits of working at 96kHz.  First, the 96kHz session moves the Nyquist-Shannon frequency far above the hearing spectrum, reducing fold back aliasing and allowing for the creation clean harmonic content.  In the above phase inversion test you can clearly hear the aliasing on the open hi-hat.  Second, there is a distinct amount of high frequency detail that is prevalent in the 96kHz bounce that is not captured in the same way in the 44.1kHz bounce.  This high frequency detail can be heard in what remains of the vocal in the phase inversion test above.  Third, higher sample rates allow you to control transient detail with more precision and less distortion than at lower sample rates.  This is why we see oversampling features built into many popular digital mastering limiters.  

If you get a chance, play around with higher sample rates and let me know the differences that you hear.