The analysis window has a fixed resolution, which determines whether there is either a good frequency resolution – frequency components close together can be separated – or good time resolution – the time at which frequencies change – . A wide window gives better frequency resolution but poor time resolution. A narrower window gives good time resolution but poor frequency resolution. These are called narrowband and wideband transforms, respectively. The size of the FFT can improve the frequency definition of the analysis. |
All sounds don't have the same characteristics, and these characteristics can change in time, or not. Selecting an FFT size involves making a compromise in termes of time and frequency accuracy. The more accurate the analysis is in one domain, the less accurate it will be in the other. The user most often make a compromise...
Variations in a stable sound occur every 2000 to 4000 samples, that is, 44 to 88 ms.
Variations in a rhythmic sound occure every 50 to 1000 samples, that is, 11,3 to 22,6 ms.
If we adapt the window size to the frequency of a 100 Hz sound – G2 –, and take a 2048 samples and 50 ms analysis window, we can easily analyse a stable sound, but not a fluctuating sound.
If we adapt the window size to the frequency of a 440 Hz sound – A3 –, we have a 512 samples and 11 ms analysis window, which is more appropriate for a fluctuating sound.
Nevertheless, with a 2048 window size, our frequency resolution is equal to 44100/2048, that is 21,5 Hz, which is quite precise. With a 512 window size, we get a frequency resolution of 86 Hz, which is poor.
If we want to analyse a sound with a low and/or fluctuating pitch, we should take an important window size.
In the case of a low pitch, a C1 for instance, we have a 32 Hz frequency with a 31 ms period. We would need a 8192 samples window size.
The FFT size is linear, but the response of the human ear to frequencies is logarithmic.
For instance, with a 50 Hz frequency resolution, bins go from 0 to 50 Hz, 50 to 100Hz, 100 to 150 Hz, etc. If we take the frequencies of the octaves from G1 to G6, we get : 100, 200, 400, 800, 1600 Hz...
In a low frequency range, 50 Hz is quite a wide interval. From a G1, a fifth. But from a G6, 50 Hz represent a semitone.
The same FFT has very fine high frequency pitch resolution, but very poor low-frequency resolution.