Transient detection is used in a number of situations, among which transient preservation during time stretching, and onset detection.
Attack transients are the initial part of an independent sound source, and occur with note onsets. They show fast changes of the sound characteristiques. A transient is a short burst of energy caused by a sudden change of state of the sound production system. It represents a non-harmonic attack phase. It contains a high degree of non-periodic components and a higher magnitude of high frequencies than the sustained phase of the signal. Transients may lead to changes of sound quality when manipulating the signal.
An onset refers to the beginning of a sound. All musical sounds have an onset, but do not necessarily include an initial transient. An onset has an increase of spectral energy from zero to an initial peak, a change in spectral energy distribution or phase, in detected pitch and spectral pattern.
Two different algorithms are used to detect transients, depending whether we are dealing with monophonic or polyphonic sounds. The detection of transients in monophonic sounds is rather easy, and the user doesn't have to master a great number of parameters. This sections mostly describes the key notions and parameters in relation with the detection of transients in polyphonic sounds.
The Super VP kernel has a specific detection algorithm that applies to polyphonic sounds. In addition to analysis markers, it is also used for treatments, to preserve transients when stretching of tranposing sounds.
In a polyphonic signal, the attack transients may occur at the same time as a stationary sound. Besides, spectral peaks can belong to attack transient, or to noise signals.
Two strategies are used : the evaluation of the energy in the signal, with the Center Of Gravity of the Energy (COGE), and the monitoring of the number of peaks in the signal, with a statistical model. The AS analysis parameters are related with these two strategies. The algorithm operates in time and frequency to detect the transients and discriminate them from stationary sounds or noise.
The smallest identifiable component in the spectral domain is a spectral peak which can be used to gather more complex signal components in a transient.
The algorithm detects the spectral peaks one by one and classifies them into transient and non transient peaks, which requires a high frequency resolution.
The number of peaks in a given frequency range is also crucial to determine if we are dealing with a transient or not. The frequency resolution affects the accuracy the comparison with the statistical model.
An attack transient has a start and end time, and is detected according to a given energy level becomes.
The start of the transient corresponds to the absolute value of the transient before the detected end time.
The end of the transient is estimated according to the maximum abolute amplitude of the transient.
The attack is finished when the energy of the bins in the current frame is smaller than half the spectral energy of the transient.
The quickness or slowness of signal changes is evaluated via the relationship with the analysis window.
If the duration is superior to that of the analysis window, the change of the signal characteristics can easily be processed by the standard phase vocoder.
If the duration is inferior, another detection strategy is required, using the Center Of Gravity of the Energy (COGE) in the signal.
Spectral frames are classified into transient and non transient bins.
Once an attack transient is detected, all the transient peaks are gathered into one single event, to estimate the transient position. Until the end of the attack event is detected, all peaks with a relevant COGE are collected into a set of transient bins.
The algorithm used for the detection of transients in monophonic sounds is based on the time variation of the spectral amplitudes.
The time position for a maximum amplitude increase is located.
The amplitude evolution is averaged over five consecutive frames. The amplitude difference between consecutive frames is calculated taking into account the spectral bins with increasing amplitudes. The sum of amplitudes differences of the bins is normalized by the sum of amplitudes of the later bins.
The algorithm yields the positions and values of the local maximum of the average spectral differences
The monophonic transient detection algorithm does not make use of any parameters.
It is applied within the frequency band specified by the user.
The size of the analysis window should be taken into accout, since it determines the time span used for averaging the amplitude spectra.