Speech signal processing tutorial (2) sound pressure level and loudness

Speech signal processing tutorial (2) sound pressure level and loudness

  In this section, let’s take a look at how to use Matlab and Python to calculate the sound pressure level and loudness of a sound.

Sound pressure level

1. Sound pressure level definition

  Let's first look at the sound pressure level, which refers to how many decibels we usually call the sound. Sound pressure is defined as the root-mean-square value of the instantaneous pressure generated by a sound wave at a certain point. Since sound pressure is easily perceivable by human ears and easy to measure, sound pressure is usually used as a physical quantity to describe the size of sound waves.

  The sound pressure level is represented by the symbol SPL (sound pressure level), which is defined as the ratio of the measured sound pressure effective value p(e) to the reference sound pressure p(ref) taking the common logarithm, and then multiplying by 20, namely:

The reference sound pressure p(ref) in the air is generally taken as 2e-5 Pa. This value is the sound pressure value at which normal human ears can just perceive the existence of 800 Hz sound, that is, the audible threshold sound pressure of 800 Hz sound. Generally speaking, below this sound pressure value, the human ear can no longer detect the existence of this sound. Obviously, the sound pressure level of the audible threshold sound pressure is zero decibels.

It refers to the effective value of sound pressure, which is the root mean square (RMS) of a sound signal. Suppose the voice length is T and the number of discrete points is N, then the effective sound pressure calculation formula is:

  Common sound decibel values:

Sound type

Sound pressure level (dB)

The sound of a high explosive explosion


Rockets and missile launch sites


Airplane engine


Generator working




Normal conversation


Whisper softly


Rural quiet night


Matlab code

  The SPL code can be easily written from the previous definition. It should be noted that the input signal in the program is a digital signal, which is a multiple of the actual analog signal size.

function spl = SPLCal(x)
    len = length(x);
    % Effective sound pressure calculation, namely RMS
    pa = sqrt(sum(x.^2)/len);
    %% Sound pressure level calculation
    % Sound pressure level value spl=20*log10(pa/p0), the unit is dB
    p0 = 2e-5;
    spl = 20*log10(pa/p0);

The complete code is as follows:

clear all;clc;close all;
%% voice framing
% The size of each frame is frameLen, when the voice length is not an integer multiple of the frame length:
% (1) If the remaining length is greater than or equal to one-half of the frame length, add zeros to the frame length
% (2) If the remaining length is less than one-half of the frame length, discard it

% Common voice frame length: 20ms, 50ms, 100ms, 200ms
framTime = 100;% unit: ms
% Signal points per frame
% m is the remainder obtained after Length/frameLen
m = mod(len,frameLen);
if m >= frameLen/2% zero padding
    x = [x;zeros(frameLen-m,1)];
    len = length(x);
else% means m <frameLen/2, then discard the remaining voice frames
    nframe = floor(len/frameLen);
    x = x(1:nframe*frameLen);
    len = length(x);
% The total number of final voice frames
N = len/frameLen;
%% calculated sound pressure level
s = zeros(1,frameLen);
% The sound pressure level value of the N frame signal is stored in the spl vector
spl = zeros(1,N);
for k = 1:N
    s = x((k-1)*frameLen + 1:k*frameLen);
    spl(k) = SPLCal(s);
%% drawing
t = 1:len;
spl_rep = repmat(spl, frameLen, 1);
subplot(211);plot(t/fs,x);grid on;xlabel('time(s)');title('input voice waveform');
subplot(212);stairs(t/fs,spl_rep(:),'r');grid on;xlabel('time(s)');ylabel('sound pressure level(dB)');
title('Sound pressure level of voice signal (dB)');


Python code

  The Python code is as follows:

import pyaudio
import wave
import numpy as np
import matplotlib.pyplot as plt

def load_wav(wave_input_path):
    wf = wave.open(wave_input_path,'rb') # read wav file
    fs = wf.getframerate()
    nframes = wf.getnframes()
    str_data = wf.readframes(nframes)
    wave_data = np.fromstring(str_data, dtype=np.short)
    return wave_data.astype(np.float64), fs

def SPLCal(x):
    Leng = len(x)
    pa = np.sqrt(np.sum(np.power(x, 2))/Leng)
    p0 = 2e-5
    spl = 20 * np.log10(pa/p0)
    return spl

if __name__ =='__main__':
    x, fs = load_wav('audio.wav')
    Leng = len(x)
    frameTime = 100
    frameLen = fs * frameTime//1000
    m = np.mod(Leng, frameLen)
    if m>=frameLen/2:
        x = np.append(x, np.zeros(int(frameLen-m)))
        Leng = len(x)
        nframe = np.floor(Leng/frameLen)
        x = x[0:nframe * frameLen + 1]
        Leng = len(x)

    N = Leng//frameLen
    spl = np.array([])
    for k in range(N):
        s = x[k*frameLen: (k+1)*frameLen]
        spl = np.append(spl, SPLCal(s))

    spl_rep = np.repeat(spl, frameLen)

The drawing is as follows:



  When speaking with the same strength, why do we always feel that women’s voices are louder than men’s? This is the loudness we are going to talk about below. Loudness is an attribute of hearing to judge the strength of a sound, and it is related to human subjective perception. Human subjectively judges the strength of the sound, that is, how loud the sound is. According to it, the sound can be arranged in a sequence from light to loud.

  When external sound vibrations are introduced into the human ears, people form a subjective perception of sound strength and weakness. People are accustomed to describing the intensity of sound waves with "ring" and "not ringing", but this description is not completely equivalent to the intensity of sound waves. The human ear’s perception of the loudness of sound waves is also related to the frequency of the sound waves. Even sounds with the same sound pressure level but different frequencies will sound different to the human ear. For example, there are two sounds of 60dB, but the frequency of one sound is 100Hz, and the other sound is 1000Hz, the human ear sounds 1000Hz sound louder than 100Hz sound. To make a sound with a frequency of 100 Hz sound the same as a sound with a frequency of 1000 Hz and a sound pressure level of 60 dB, the sound pressure level must reach 67 dB.

Here are a few related concepts:

  • Loudness level: According to the perception characteristics of human ears, the subjective acoustic perception of sound is determined according to sound pressure and frequency, called loudness level, and the unit is square.
  • Phon : When a pure tone of a certain frequency and a pure tone of 1000 Hz sound the same loudness, then the sound pressure level of a pure tone of 1000 Hz is defined as the loudness level of the pending sound. Therefore, at a frequency of 1kHz, the sound pressure level is 60dBSPL and the loudness of the signal is 60 square. Comparing the sounds of various frequencies in this way, the relationship curve between the frequency and the sound pressure level when the same loudness level is reached is obtained. This is the equal-loudness curve of our human ears.

image-20210220112716013 From the equal loudness curve, we find that the human ear is more sensitive to high-frequency sounds, and the loudness level of high-frequency sounds under the same sound pressure level is higher than that of low-frequency sounds. Generally speaking, women have more high-frequency components, while men have more low-frequency components. This is why women’s voices sound louder when speaking with the same strength (same sound pressure level). Since this objective unit only expresses the human ear's response to loudness in a very limited manner, a subjective concept of loudness can be introduced -Song.

  • Song (Sone) : It refers to the change in the sense of loudness of the human ear in accordance with the change in sound pressure level in the natural state. The relationship between "Song" and "Fang" shows that 1 Song is equal to 40 squares (that is, in the equal loudness curve diagram, 1kHz represents 40dBSPL), and with 1 Song as the standard, the loudness doubles at 2 Songs, at 0.5 Song When the loudness is reduced by one time.

  According to the definition of the IS0226-2003 standard H equal loudness curve, the sound pressure level LP is:

among them,

Among them, is the hearing threshold; is the loudness perception index; is the amplitude of the linear transfer function calculated on the basis of 1000 Hz. These three parameters can be found in ISO226.


The Matlab code for calculating loudness is as follows:

function [spl, freq] = loudnessCal(phon)
f = [20 25 31.5 40 50 63 80 100 125 160 200 250 315 400 500 630 800 ...
     1000 1250 1600 2000 2500 3150 4000 5000 6300 8000 10000 12500];

af = [0.532 0.506 0.480 0.455 0.432 0.409 0.387 0.367 0.349 0.330 0.315 ...
      0.301 0.288 0.276 0.267 0.259 0.253 0.250 0.246 0.244 0.243 0.243 ...
      0.243 0.242 0.242 0.245 0.254 0.271 0.301];

Lu = [-31.6 -27.2 -23.0 -19.1 -15.9 -13.0 -10.3 -8.1 -6.2 -4.5 -3.1 ...
       -2.0 -1.1 -0.4 0.0 0.3 0.5 0.0 -2.7 -4.1 -1.0 1.7 ...
        2.5 1.2 -2.1 -7.1 -11.2 -10.7 -3.1];

Tf = [78.5 68.7 59.5 51.1 44.0 37.5 31.5 26.5 22.1 17.9 14.4 ...
       11.4 8.6 6.2 4.4 3.0 2.2 2.4 3.5 1.7 -1.3 -4.2 ...
       -6.0 -5.4 -1.5 6.0 12.6 13.9 12.3];  

    Ln = phon;

    %Calculate the sound pressure level from the loudness level 
    Af=4.47E-3 * (10.^(0.025*Ln)-1.15) + (0.4*10.^(((Tf+Lu)/10)-9 )).^af;
    Lp=((10./af).*log10(Af))-Lu + 94;

    spl = Lp;  
    freq = f;


Reference: https://cloud.tencent.com/developer/article/1794094 Voice signal processing tutorial (2) Sound pressure level and loudness-Cloud + Community-Tencent Cloud