In recent years, platforms such as YouTube and Amazon have started to manage volume levels using loudness normalization. Knowledge of loudness normalization is now an essential item that those working with sound cannot ignore. There is a lot of information about loudness normalization on the internet, but there seems to be a lot of talk about how loudness integrated should be set. Here, I would like to explain the details of loudness normalization from a technical perspective. Once you understand how it works and what its features are, you may find that your usage changes.
The Birth of Loudness Normalization
From the 1990s to the 2010s, as we entered the digital age, there was a background of a competition to increase sound pressure (the loudness war). In media that handle sound, such as TV, the sound pressure was increasing. I think many people were surprised by the sudden loud commercials. As a result of prioritizing prominence through sound in TV, which is supposed to be a comfortable viewing experience for users, this unpleasant phenomenon has spread. This has been a problem all over the world, and since the 1990s the ITU (International Telecommunication Union) has been formulating countermeasures. In 2006, these were recommended, and since 2011 they have been adopted by broadcasters all over the world, and since 2015 they have been adopted by online platforms. However, the problem has not been solved, and it seems that such problems are only just beginning to be recognized. Ideally, it would be ideal if viewers didn't have to turn the volume up or down, but it seems that this is still a long way off. In reality, we need not only loudness normalization, but also other guidelines. Also, because there is an issue of awareness among creators, I don't think the problem will suddenly be solved, but will be improved slowly.
ITU Created a Measuring Stick
You can obtain and check the ITU documents online. They contain not only an overview, but also technical details, so you can perform various types of verification.
ITU (International Telecommunication Union)
- 2006 Rec. ITU-R BS.1770 “Algoritmos para medir la sonoridad de los programas radiofónicos y el nivel de cresta de audio real”
- 2006 Rec. ITU-R BS.1771 “Requirements for loudness and true-peak indicating meters - ITU”
- 2010 Rec. ITU-R BS.1864 “Loudness Operation Standards for International Exchange of Digital TV Programs”
Versions 1770 and 1771 have been upgraded. The ITU has only proposed a method for measuring volume. How this is used is left up to each platform.
Each Platform Sets a Standard Value
The volume is measured using the loudness normalization system created by the ITU, and each platform sets its own reference values and operates accordingly. In most cases, the values monitored are LUFS-I (Loudness Integrated) and Peak values. I will explain the details next time, so for now, please think of LUFS-I as the average volume and Peak values (true peak) as the maximum volume.
Each platform sets its own reference value, and any audio that exceeds this value is lowered to the reference value, while audio that is below the reference value is often left unchanged. This allows you to suppress content with loud volume and reduce the overall variation in volume. Creators need to be aware of these meanings to ensure that the volume does not go in unintended directions. The LUF-I values for each platform are listed below.
Platform | Recommended LUF-I (dB) | Peak(true peak) |
---|---|---|
YouTube | -14 LUFS-I | -1.0dBTP |
Spotify | -14 LUFS-I | -1.0dBTP |
Apple Music | -16 LUFS-I | -1.0dBTP |
Volume Standards in Music
Music as a product is often treated as a finished product in the form of records, CDs, or more recently, audio files. The sound of digital media is set to 0dB as the maximum volume. In the diagram below, the first row shows a waveform of 32-bit float that exceeds 0dB. In fact, many formats such as CDs cannot record above 0dB and are cut off, so as in the second row, anything above 0dB is flattened out. When played back, it will be clipped and make a crackling sound.

For this reason, it is adjusted so that it never exceeds 0 dB. In the era of the sound pressure competition, it was considered better to pack in as much sound as possible, and efforts were made to increase the sound pressure as much as possible. In particular, in pop and rock music, the sound pressure was increased to an excessive degree.
True peak
Since loudness normalization deals with the true peak, I will explain it a little. The following is a screenshot to explain the true peak.
The first line is the original with a sampling frequency of 48000Hz and a peak of 0dB. Since it does not exceed 0dB, it seems that the sound will not be distorted, but depending on the conditions, such as the playback device's compensation, resampling, format conversion, etc., you never know what might happen.
The second line is a resampling of the first line at twice the frequency, 96,000Hz. You can see that the red sample is clipping at over 0dB. When you resample in this way, you often get a larger peak than before resampling, because you are complementing the samples.
The third line is an approximation of the ideal waveform. The peak in this state is called the true peak, and it should not exceed 0 dB.

Excessive Sound Pressure
The lower waveform is the waveform of the CD released in 2002, which had the sound pressure increased to an excessive level. You can see that the waveform is stuck around -1dB, which is the maximum volume. When I measured it, it was -8dBLUFS-I. If you forcefully increase the sound pressure, even sounds that are not that loud will be increased, and the difference between loud and quiet sounds will become smaller, and because it will always try to reach the maximum amplitude, it will become an unnatural, very noisy sound. However, the volume will increase to the maximum, and as a result, it will stand out with a loud sound.

If you upload such a sound source directly to a platform on the internet, the volume will be reduced considerably. In particular, you need to be careful with sound sources that have been excessively compressed to create a powerful sound, as they may not achieve the desired effect.
If the above is uploaded to YouTube, it will be processed into the following waveform. The volume is lowered by about 6dB, and the peak is around -6dB. Overall, it is -14dBLUFS-I, as specified by YouTube. In terms of amplitude, it is reduced to about half of the original.

Natural Sound Pressure
The waveform below is from the same artist as above, but it is a CD released in 1989. It is a relatively natural-sounding source, and is not too far from -14 LUFS-I. The peak is around -2 dB, so it also has a lot of power. Above all, when it is uploaded to the internet, the volume can hardly be lowered, and the image will not be destroyed. And it sounds louder than the explosive sound source that aims for the above-mentioned power. In addition, I think that this is probably a more desirable sound quality.

Since -14 LUFS-I is used a lot online, it might be a good idea to aim for -14 when producing, but basically, it is a value that has been decided upon for handling a wide variety of programs in the same way as TV, so I personally don't think it is necessary to be too conscious of it when it comes to music. Rather, it is more important to be aware of other things, and to know how LUFS-I is calculated. I will explain this next time.
The “sound & person” column is made up of contributions from you.
For details about contributing, click here.