What the size of the window means. What is FFT page 3.
To get the best possible analyze result its important to evaluate what size of the analyze window you should use. If its a noise type of sound, if its harmonically rich, if there is a lot of timbreal changes (like in a spoken sentence), or have a fast transient attack, like a snare drum, things like that.
Look at table 1. It gives you all the essential information about the limits and benefit of a specific analyze window size.
Values is rounded of.
- a low frequency has a long wavelength, its important that the whole wave, one single cycle, fits inside one analyze window. So if you need an accurate analyze of low frequencies, you have to go for long windows. My estimates for low frequency is pretty optimistic, most other would double my estimates.
- to be able to analyze a rich spectra's, like noisy sounds, in detail, we have to have long a analyze window. A long windows can analyze a sounds timbre much more detailed. A 128 sample long window can only do an analyze that will have 64 bands (that is 64 partials) but a window of 4096 will give you an analyze that may contain up to 2048 partials.
- longer window have much less accuracy when it comes to analyzing fast changes in a sound. Everything that happens inside a window gets mixed together. The analyze only sees one timbre inside that time length/analyze window. A window of 8192 samples mixes all sound in that time frame together, blurring dynamic changes. A window of 128 samples has an analyze time of 0.006 seconds and thereby can analyze fast charges more accurate.
There is always this trade between a god analyze of the sounds timbre and an god analyze of what happens in time.
It seems to be widely accepted that for most sounds a window of 1024 samples is a useful compromise between a detailed analyses of the timbre and a reasonable resolution in time. That is if you want to keep as much of the sounds character in the manipulation of the analyze, like in a time stretch.
What is the shortest sound we can hear?
Its somewhere around 0.003 seconds, so a window of 1024 samples is not short enough?
Lets listen to how this sound when we do manipulation of a sound.
I have a small word, my then 8 years old son that says, “tidelita” (nice word isn’t it , means nothing in Norwegian and probably nothing in English either.
I have not included a sample of the source, just listen for yourself. Which of the tree examples in Example 12 that sounds most.....natural, and......... unmanipulated?
Look at Table 1, compare the different window sizes properties, listen to the examples and hear if you can get why it sounds the way it does.
- Can you clearly hear all the vowels and consonants, like the t:s?
- Is the sound quality smooth and even?
- Does the 128 sample window have a lower timbral resolution?
- Does the 8192 sample analyze sound smother but have a more blurred time resolution?
To my ears its the second one that sounds most natural. As you can se from the label it has