matsc_site
matsc_site

This text is best read  after you have read about FastFourierTransformation.

 

Convolution.
This is the technique many, if not all, (new) software and hardware uses when it comes to room and hardware simulation. Programs like
Waves IR, Altiverb, SIR and hardware like Focusrite The Liquid Channel all uses convolution.

Convolution can be forexample reverbation, equalization and modulation and can be used to capture the character of hardware like microphone preamps or guitar amplifiers.

 

What convolution can do is to capture the the sound of a room or what happens when you put a signal through a microphone tube amplifier. This capturing is a snapshot of of what happens in a spesific situation.

That means we can get the characteristics of a Marshall 100 4x12" stack when the controls of the amplifier are in an specific position. It doesn't capture all of the possibilities in the Marshall stack, only what happens in a given moment, Its a sort of snapshot.

Having said that, there are several hardware and software programs that gets around this problem by clever programming. How this is done is way beyond this simple text.
 

Impulse response.
To understand what convolution does let me describe how this works when making a reverb.
Lets say we want to use a reverb from a room, a room that has an desirable reverb. It could be a concert hall, like the Oslo concert hall, or a small room like your bathroom.

In that room we make a very short and preferably loud sound. For example a start pistol shot , a large balloon that we stick a needle in. This is what we call an impulse.
We record that sound and the reverbation it makes in the room. We call this the rooms Impulse response.

Its important
where we place our microphones, and if we make a mono or stereo recording (impulse), we could even make a multi mic recording. Reverbation today is almost always stereo (different processing of left and right channel), so when it comes to reverb recording the short sound is preferably recorded in stereo.

The place where we put our stereo microphone is where we record the sound of the reverbation. If you fire your start pistol (your impulse) at the stage of the concert hall and put the microphones on the first row you will get an different result (response ) than if you put the microphone on the 30th row or in the balcony.

The same goes for if you fire your start pistol (your impulse) at the far left on the stage or right in front of the stage. The recording (the impulse response) will  capture of the character of the reverbation under these special circumstances. Usually when you capture an impulse response of a concert hall you initiate the impulse somewhere at the center of the stage and record the impulse response in the center of the hall, but as you know from Altiverb, ther is a very god reason for reecording the impulse at several difereent places in the room.
 

Many Convolution reverb specifies the exact position of the recording, you often see something like Sidney concert Hall row 26 seat 31, when a impulse response is named.

 

Direct convolution. Convolution in the time domain.
This is something we normally don't do, but how you do it mathematically is easy to understand, and interesting to know about.

Lets say we have two equally long sounds both 5 seconds long. They are digitally sampled with the same resolution for example 16 bit 44.1 kHz. . One sound is a spoken word sentience and the other is a recording of a pistol shot in an concert hall as described above, an impulse response from the recording space. We usually edit the start of the impulse response as we don't want to have the start (the impulse) as part of the sound, just the reverbation it makes.

As both of these sound has exactly the same length, 5 seconds, and sampled at 44.1 kHz they are both 220500 samples long.
We then take the first sample of the vocal sound and multiply that with every sample of the reverb sound. Next the second sample of the vocal sound multiplied with every sample of the reverb sound. Every multiplication will generate a new file that is 220500 samples long. We do that until we come to the end of the vocal sound. We now have 220500 new files. All of these new files are summed and time delayed so the first multiplication happens at 0 time and the next delayed 1/44100 second, the second 2/44100 second and so on. The final sound file will then be 10 seconds long, vocal sound + impulse response file/reverb.

 

This is a very process demanding operation, and even if this is possible it will take to much time to execute in real time

 

Fast convolution.
Instead of the computer intensive direct convolution we do convolution in the frequency domain using two FFT analyses.
To describe this in a simple way lets say we have two sounds that has been analyzed using10 fft windows, a vocal sound and an impulse response from a room, the Reverb sound.

Look at the illustrations below.
We multiplies the first vocal window with every window in the Reverb sound.

 

We then multiplies the second window with every window in the Reverb sound. the result is shifted to its correct position, one window to the right.

 

We do this multiplication with every window from the vocal sound. We then get 10 new files that are summed. and Viola the vocal is now in the space of the impulse response, the room.

 

Dear reader, this is an brief explanation of what's going on, there are other "things" but as a general description for the none mathematical person it may be useful

 

Dear reader, this is an brief explanation of what's going on, there are other "things" but as a general description for the none mathematical person it may be useful

We do this multiplication with every window from the vocal sound. We then get 10 new files that are summed. and Viola the vocal is now in the space of the impulse response, the room