A fourier transformation (FT) is used to transfer an audio-signal from time-domain to the frequency-domain. This can, for instance, be used to analyze and visualize the spectrum of the signal appearing in one moment. Fourier transform and subsequent manipulations in the frequency domain open a wide area of interesting sound transformations, like time stretching, pitch shifting and much more.
The mathematician J.B. Fourier (1768-1830) developed a method to approximate unknown functions by using trigonometric functions. The advantage of this was, that the properties of the trigonometric functions (sin & cos) were well-known and helped to describe the properties of the unknown function.
In music, a fourier transformed signal is decomposed into its sum of sinoids. In easy words: Fourier transform is the opposite of additive synthesis. Ideally, a sound can be splitted by Fourier transformation into its partial components, and resynthesized again by adding these components.
Because of sound beeing represented as discrete samples in the computer, the computer implementation calculates a discrete Fourier transform (DFT). As each transformation needs a certain number of samples, one main decision in performing DFT is about the number of samples used. The analysis of the frequency components is better the more samples are used for it. But as samples are progression in time, a caveat must be found for each FT in music between either better time resolution (fewer samples) or better frequency resolution (more samples). A typical value for FT in music is to have about 20-100 "snapshots" per second (which can be compared to the single frames in a film or video).
At a sample rate of 48000 samples per second, these are about 500-2500 samples for one frame or window. The standard method for DFT in computer music works with window sizes which are power-of-two samples long, for instance 512, 1024 or 2048 samples. The reason for this restriction is that DFT for these power-of-two sized frames can be calculated much faster. So it is called Fast Fourier Transform (FFT), and this is the standard implementation of the Fourier transform in audio applications.
As usual, there is not just one way to work with FFT and spectral processing in Csound. There are several families of opcodes. Each family can be very useful for a specific approach of working in the frequency domain. Have a look at the Spectral Processing overview in the Csound Manual. This introduction will focus on the so-called "Phase Vocoder Streaming" opcodes (all these opcodes begin with the charcters "pvs") which came into Csound by the work of Richard Dobson, Victor Lazzarini and others. They are designed to work in realtime in the frequency domain in Csound; and indeed they are not just very fast but also easier to use than FFT implementations in some other applications.
For dealing with signals in the frequency domain, the pvs opcodes implement a new signal type, the f-signals. Csound shows the type of a variable in the first letter of its name. Each audio signal starts with an a, each control signal with a k, and so each signal in the frequency domain used by the pvs-opcodes starts with an f.
There are several ways to create an f-signal. The most common way is to convert an audio signal to a frequency signal. The first example covers two typical situations:
(Be careful - the example can produce a feedback three seconds after the start. Best results are with headphones.)
EXAMPLE 04I01.csd 1
<CsoundSynthesizer> <CsOptions> -i adc -o dac </CsOptions> <CsInstruments> ;Example by Joachim Heintz ;uses the file "fox.wav" (distributed with the Csound Manual) sr = 44100 ksmps = 32 nchnls = 2 0dbfs = 1 ;general values for fourier transform gifftsiz = 1024 gioverlap = 256 giwintyp = 1 ;von hann window instr 1 ;soundfile to fsig asig soundin "fox.wav" fsig pvsanal asig, gifftsiz, gioverlap, gifftsiz*2, giwintyp aback pvsynth fsig outs aback, aback endin instr 2 ;live input to fsig prints "LIVE INPUT NOW!%n" ain inch 1 ;live input from channel 1 fsig pvsanal ain, gifftsiz, gioverlap, gifftsiz, giwintyp alisten pvsynth fsig outs alisten, alisten endin </CsInstruments> <CsScore> i 1 0 3 i 2 3 10 </CsScore> </CsoundSynthesizer>
You should hear first the "fox.wav" sample, and then, the slightly delayed live input signal. The delay depends first on the general settings for realtime input (ksmps, -b and -B: see chapter 2D). But second, there is also a delay added by the FFT. The window size here is 1024 samples, so the additional delay is 1024/44100 = 0.023 seconds. If you change the window size gifftsiz to 2048 or to 512 samples, you should get a larger or shorter delay. - So for realtime applications, the decision about the FFT size is not only a question "better time resolution versus better frequency resolution", but it is also a question of tolerable latency.
What happens in the example above? At first, the audio signal (asig, ain) is being analyzed and transformed in an f-signal. This is done via the opcode pvsanal. Then nothing happens but transforming the frequency domain signal back into an audio signal. This is called inverse Fourier transformation (IFT or IFFT) and is done by the opcode pvsynth.2 In this case, it is just a test: to see if everything works, to hear the results of different window sizes, to check the latency. But potentially you can insert any other pvs opcode(s) in between this entrance and exit:
Simple pitch shifting can be done by the opcode pvscale. All the frequency data in the f-signal are scaled by a certain value. Multiplying by 2 results in transposing an octave upwards; multiplying by 0.5 in transposing an octave downwards. For accepting cent values instead of ratios as input, the cent opcode can be used.
EXAMPLE 04I02.csd
<CsoundSynthesizer> <CsOptions> -odac </CsOptions> <CsInstruments> ;example by joachim heintz sr = 44100 ksmps = 32 nchnls = 1 0dbfs = 1 gifftsize = 1024 gioverlap = gifftsize / 4 giwinsize = gifftsize giwinshape = 1; von-Hann window instr 1 ;scaling by a factor ain soundin "fox.wav" fftin pvsanal ain, gifftsize, gioverlap, giwinsize, giwinshape fftscal pvscale fftin, p4 aout pvsynth fftscal out aout endin instr 2 ;scaling by a cent value ain soundin "fox.wav" fftin pvsanal ain, gifftsize, gioverlap, giwinsize, giwinshape fftscal pvscale fftin, cent(p4) aout pvsynth fftscal out aout/3 endin </CsInstruments> <CsScore> i 1 0 3 1; original pitch i 1 3 3 .5; octave lower i 1 6 3 2 ;octave higher i 2 9 3 0 i 2 9 3 400 ;major third i 2 9 3 700 ;fifth e </CsScore> </CsoundSynthesizer>
Pitch shifting via FFT resynthesis is very simple in general, but more or less complicated in detail. With speech for instance, there is a problem because of the formants. If you simply scale the frequencies, the formants are shifted, too, and the sound gets the typical "Mickey-Mousing" effect. There are some parameters in the pvscale opcode, and some other pvs-opcodes which can help to avoid this, but the result always depends on the individual sounds and on your ideas.
As the Fourier transformation seperates the spectral information from the progression in time, both elements can be varied independently. Pitch shifting via the pvscale opcode, as in the previous example, is independent from the speed of reading the audio data. The complement is changing the time without changing the pitch: time stretching or time compression.
The simplest way to alter the speed of a samples sound is using pvstanal (which is new in Csound 5.13). This opcode transforms a sound which is stored in a function table, in an f-signal, and time manipulations are simply done by altering the ktimescal parameter.
Example 04I03.csd
<CsoundSynthesizer> <CsOptions> -odac </CsOptions> <CsInstruments> ;example by joachim heintz sr = 44100 ksmps = 32 nchnls = 1 0dbfs = 1 ;store the sample "fox.wav" in a function table (buffer) gifil ftgen 0, 0, 0, 1, "fox.wav", 0, 0, 1 ;general values for the pvstanal opcode giamp = 1 ;amplitude scaling gipitch = 1 ;pitch scaling gidet = 0 ;onset detection giwrap = 0 ;no loop reading giskip = 0 ;start at the beginning gifftsiz = 1024 ;fft size giovlp = gifftsiz/8 ;overlap size githresh = 0 ;threshold instr 1 ;simple time stretching / compressing fsig pvstanal p4, giamp, gipitch, gifil, gidet, giwrap, giskip, gifftsiz, giovlp, githresh aout pvsynth fsig out aout endin instr 2 ;automatic scratching kspeed randi 2, 2, 2 ;speed randomly between -2 and 2 kpitch randi p4, 2, 2 ;pitch between 2 octaves lower or higher fsig pvstanal kspeed, 1, octave(kpitch), gifil aout pvsynth fsig aenv linen aout, .003, p3, .1 out aout endin </CsInstruments> <CsScore> ; speed i 1 0 3 1 i . + 10 .33 i . + 2 3 s i 2 0 10 0;random scratching without ... i . 11 10 2 ;... and with pitch changes </CsScore> </CsoundSynthesizer>
Working in the frequency domain makes it possible to combine or "cross" the spectra of two sounds. As the Fourier transform of an analysis frame results in a frequency and an amplitude value for each frequency "bin", there are many different ways of performing cross synthesis. The most common methods are:
This is an example for phase vocoding. It is nice to have speech as sound A, and a rich sound, like classical music, as sound B. Here the "fox" sample is being played at half speed and "sings" through the music of sound B:
EXAMPLE 04I04.csd
<CsoundSynthesizer> <CsOptions> -odac </CsOptions> <CsInstruments> ;example by joachim heintz sr = 44100 ksmps = 32 nchnls = 1 0dbfs = 1 ;store the samples in function tables (buffers) gifilA ftgen 0, 0, 0, 1, "fox.wav", 0, 0, 1 gifilB ftgen 0, 0, 0, 1, "ClassGuit.wav", 0, 0, 1 ;general values for the pvstanal opcode giamp = 1 ;amplitude scaling gipitch = 1 ;pitch scaling gidet = 0 ;onset detection giwrap = 1 ;loop reading giskip = 0 ;start at the beginning gifftsiz = 1024 ;fft size giovlp = gifftsiz/8 ;overlap size githresh = 0 ;threshold instr 1 ;read "fox.wav" in half speed and cross with classical guitar sample fsigA pvstanal .5, giamp, gipitch, gifilA, gidet, giwrap, giskip, gifftsiz, giovlp, githresh fsigB pvstanal 1, giamp, gipitch, gifilB, gidet, giwrap, giskip, gifftsiz, giovlp, githresh fvoc pvsvoc fsigA, fsigB, 1, 1 aout pvsynth fvoc aenv linen aout, .1, p3, .5 out aout endin </CsInstruments> <CsScore> i 1 0 11 </CsScore> </CsoundSynthesizer>
The next example introduces pvscross:
EXAMPLE 04I05.csd
<CsoundSynthesizer> <CsOptions> -odac </CsOptions> <CsInstruments> ;example by joachim heintz sr = 44100 ksmps = 32 nchnls = 1 0dbfs = 1 ;store the samples in function tables (buffers) gifilA ftgen 0, 0, 0, 1, "BratscheMono.wav", 0, 0, 1 gifilB ftgen 0, 0, 0, 1, "fox.wav", 0, 0, 1 ;general values for the pvstanal opcode giamp = 1 ;amplitude scaling gipitch = 1 ;pitch scaling gidet = 0 ;onset detection giwrap = 1 ;loop reading giskip = 0 ;start at the beginning gifftsiz = 1024 ;fft size giovlp = gifftsiz/8 ;overlap size githresh = 0 ;threshold instr 1 ;cross viola with "fox.wav" in half speed fsigA pvstanal 1, giamp, gipitch, gifilA, gidet, giwrap, giskip, gifftsiz, giovlp, githresh fsigB pvstanal .5, giamp, gipitch, gifilB, gidet, giwrap, giskip, gifftsiz, giovlp, githresh fcross pvscross fsigA, fsigB, 0, 1 aout pvsynth fcross aenv linen aout, .1, p3, .5 out aout endin </CsInstruments> <CsScore> i 1 0 11 </CsScore> </CsoundSynthesizer>
The last example shows spectral filtering via pvsfilter. The well-known "fox" (sound A) is now filtered by the viola (sound B). Its resulting intensity depends on the amplitudes of sound B, and if the amplitudes are strong enough, you hear a resonating effect:
EXAMPLE 04I06.csd
<CsoundSynthesizer> <CsOptions> -odac </CsOptions> <CsInstruments> ;example by joachim heintz sr = 44100 ksmps = 32 nchnls = 1 0dbfs = 1 ;store the samples in function tables (buffers) gifilA ftgen 0, 0, 0, 1, "fox.wav", 0, 0, 1 gifilB ftgen 0, 0, 0, 1, "BratscheMono.wav", 0, 0, 1 ;general values for the pvstanal opcode giamp = 1 ;amplitude scaling gipitch = 1 ;pitch scaling gidet = 0 ;onset detection giwrap = 1 ;loop reading giskip = 0 ;start at the beginning gifftsiz = 1024 ;fft size giovlp = gifftsiz/4 ;overlap size githresh = 0 ;threshold instr 1 ;filters "fox.wav" (half speed) by the spectrum of the viola (double speed) fsigA pvstanal .5, giamp, gipitch, gifilA, gidet, giwrap, giskip, gifftsiz, giovlp, githresh fsigB pvstanal 2, 5, gipitch, gifilB, gidet, giwrap, giskip, gifftsiz, giovlp, githresh ffilt pvsfilter fsigA, fsigB, 1 aout pvsynth ffilt aenv linen aout, .1, p3, .5 out aout endin </CsInstruments> <CsScore> i 1 0 11 </CsScore> </CsoundSynthesizer>
There are much more ways of working with the pvs opcodes. Have a look at the Signal Processing II section of the Opcodes Overview to find some hints.