Audio Normalization 


 

I consider my understanding of audio technology limited but sometimes, on internet forums, I even recognize incorrect answers about topics I consider very basic. Before reading on you might want to read one of my earlier posts From (True) Peak via RMS to LUFS.

Recently, there was a discussion about audio normalization following a question what it exactly is and whether or not you should normalize your audio fragments (e.g., as part of gain staging). Couple of answers indicated that you should never normalize your audio since it would affect its dynamics. This is not true. Dynamics is, for example, changed by compressors and limiters but, in principle, normalization should not affect this.

When I say (below) that dynamics is not affected, I should be more precise. In music production, dynamic range means the difference between the loudest and quietest sounds. It’s measured in decibels (dB). In a single audio track, dynamic range means the dB difference between the loudest and quietest moment in the audio file. I we normalize an audio file then, of course, the dynamic range is changing. However, when I say ‘dynamics is not changing’ then I talk about the relative dynamics, i.e., the ratios between the audio levels at certain time points will not change.

However, there is much more to say about audio normalization then it looks at first sight. Below a further explanation.

 

dBFS

Let us first define dB Full Scale (dbFS). 0 dBFS is the highest signal level achievable in a digital audio WAV file. Higher levels are possible inside a DAW such as Cubase, but in the files that are recorded on disk, 0 dBFS is the highest level. All other levels can be defined with respect to 0 dBFS. So for example a signal that is 10 decibels lower than the maximum possible level is -10 dBFS.

Now, we can define the gain as G(dBFS) = 20 log10 (A/A0) with A representing the amplitude (peak-to-peak voltage level ) of the audio wave and A0 is the reference level, i.e., the amplitude at 0dbFS. Thus, a gain of 0dbFS corresponds to A=A0 (since log(1)=0).

If we reduce the audio level (amplitude A) to -6dB then we get -6dBFS = 20log10(A/A0), which gives  A = 0.5*A0. Thus, reducing the audio level (volume) with a factor 2 is the same as applying a gain of -6dB. Similarly, reducing the volume to 25% corresponds to a gain reduction of -12dB.

We can also look at this little bit differently. If we change the audio level with a factor f we get G(dBFS) = 20 log10 (f*A/A0). This gives G(dBFS) = 20 log10 (f) + 20log10(A/A0). In the context of dbFS we have f<=1.0.  Thus we see that multiplying the audio signal with a certain factor (in our case a reduction of volume; f<=1.0) is identical to substracting 20log(f) dB’s.

The maximum peak level is reached at the end of the binary bit-depth resolution (all 1’s in, typically speaking, a 16-bit or 24-bit system). All 0’s, then, would represent no digital signal

 

Audio normalization

Audio normalization is nothing else then adding a constant amount of gain until a pre-specified target level (e.g., 0dBFS the highest level in a digital system) is reached. This does not affect the signal-to-noise ratio nor the dynamics.

But there is more to say. We can distinguish between peak normalization and loudness normalization. In peak normalization the audio level is increased until its highest level (peak) reaches its target value. In loudness normalization the level is adjusted based on perceived loudness. However, both only affect the audio level (volume) and nothing else.

In peak normalization we, thus, multiply the audio signal such that the target level (e.g., 0dbFS) is reached. The procedure for loudness normalization is more complicated but in essence it is doing the same thing (see this MSc thesis).

Since peak normalization is based on the highest level, peak normalization alone does not account (alone) for the apparent loudness of the audio. Our perception of loudness is largely unrelated to the peaks in a track, and much more dependent on the average level throughout the track. We naturally perceive a track with a higher average level, with less high peaks as “louder” than a track with a lower average level and higher peaks. Peak normalization to 0dBFS can still clip the audio signal due to inter-sample peaks (True Peaks) or due to further processing of the signal. Therefore, if possible, one should preferably do peak normalization for true peaks but in general it is a good idea to leave some headroom.

With loudness normalization, since it normalizes the average level, the peaks of audio may start to clip resulting in distorted audio. Therefore, one should be careful since while loudness normalization should not affect dynamics it may do so in practice if one attempts to increase the average level too much (in which case compression or limiting might be applied  by the algorithm).

In the past I peak normalized all my audio clips in Cubase as a sort of ‘gain staging’. I abandoned that approach for other, more appropriate, gain staging approaches.

 

Some examples

To demonstrate both types of normalization I have taken a short audio fragment of a drum track (32 bit, 44.1kHz) and normalized in Steinberg Wavelab Pro 10.0.

 

Not normalized (original wave file)

You can see that the maximum levels (digital peaks and true peaks) and the loudness level in the screens below (analyzed by WaveLab). The true peak level is around -4dB and the loudness around -20 LUFS (Loudness Units Full Scale). Strangely, the maximum levels of the wave do not seem to completely correspond to the digital peak levels as analyzed by Wavelab. For this analyses this is not too important. You can listen to this audio clip on soundcloud:

 

Peak normalized at 0dBFS
I normalized at 0dBFS (true peaks). Clearly, the volume of this clip has changed but dynamics is not affected. The audio signal is just multiplied with a specific factor, thus also signal to noise ratio remains unaffected. The loudness of this clip is about -17 LUFS.


Loudness normalization to high average level
In the next example I performed a loudness normalization to about -12 LUFS. This level is cause the peaks to go above 0 dBFS and, therefore, as part of the normalization algorithm about 10dB of limiting has been applied. This is visible from the wave, which is now clipping at 0dBS. Consequently, the dynamics of the audio changed (the lower level audio parts are now more upfront) and therefore the audio now not only has a higher volume, but also sounds different.

 


Loudness normalized file with  subsequent peak normalization to -5.5dBFS

Next, I applied peak normalization to the previous loudness normalized clips. This brings the peaks back to about -5.5 dBFS but does, of course, not restore the dynamics. In comparison to the unnormalized clip, this clip still has a higher volume (about -18 LUFS compared to -20 LUFS for the unnormalized clip) but it also sounds different because the dynamics was changed. Thus, one should be careful with loudness normalization.

 


Loudness Normalization to -16LUFS

Next I applied loudness normalization to about -16 LUFS (instead of -12 LUFS in the previous example). This does not cause any peaks to go beyond 0 dbFS (no peak clipping) and hence the dynamics is not changed and this clip sounds the same as the peak normalized clip.

 


Loudness normalization to -16LUFs and subsequently peak normalization to -5.5dBFS
Finally, I peak normalized the previous loudness normalized clip to about -5.5 dBFS resulting in a loudness of about 21 LUFS. This is comparable to the unnormalized clip and, therefore, this clip has about the same volume as the unnormalized clip and still sounds the same since the dynamics (nor any other thing) has been changed; only volume.

Further references